Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

41 results about "UTF-8" patented technology

UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes. The encoding is defined by the Unicode Standard, and was originally designed by Ken Thompson and Rob Pike. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.

Method and Apparatus for XML Parsing Using Parallel Bit streams

One embodiment of the present invention is an apparatus that processes XML, which apparatus comprises (a) an XML interface module that applies Document Type Definitions, XML Schema, XPath expressions and other XML model information to an XML model processor and applies XML character stream data to a parallel bit stream module, (b) an XML model processor that supplies symbol table entries to an XML symbol table module and regular expressions for validating XML data values to regular expression compiler, (c) an XML symbol table module that stores symbol table entries for later use in parsing, (d) a regular expression compiler that produces dynamic executable code for validating regular expressions using parallel bit streams, (e) a lexical item stream module that generates lexical items relevant to XML parsing and to validation of compiled regular expressions, (f) a transcoder that converts UTF-8 to UTF-16 as required, (g) a parser that makes parsing decisions in response to character streams in combination with lexical item streams and (h) a parsed data receiver to receive parsed data items from the parser.
Owner:INT CHARACTERS INC

Systems and methods of utf-8 pattern matching

Systems and methods are described for efficiently processing, searching and / or rewriting variable width encoded data, such as UTF-8 encoded data, will be described. Embodiments of the systems and methods modify and adapt search algorithms, such as the Horspool and Wu-Manber algorithms, to efficiently process and manage searching of variable width encoded text in large blocks of text, such as text that may be carried via a stream of packets thru a network device, such as an intermediary device.
Owner:CITRIX SYST INC

Embedded type terminal and UTF-8 and GB2312 code conversion method thereof

InactiveCN101655836ATroubleshoot character handling issuesSolve the problem that characters cannot be processedSpecial data processing applicationsUTF-8Operational system
The invention discloses an embedded type terminal and a UTF-8 and GB2312 code conversion method thereof. The UTF-8 and GB2312 code conversion method based on the embedded type movable terminal comprises the following steps: receiving a request which comes from an application program and converts a GB2312 code into a UTF-8 code; reading a GB2312 code character in the embedded type terminal according to the request; converting the read GB 2312 code character into a Unicode code character; directly converting the converted Unicode code character into a UTF-8 code character; and returning a resultof the converted UTF-8 code character to the application program. Under the environment without an operating system window or linux and the condition without other usable APIs, the UTF-8 and GB2312 code conversion method can solve the problem of character treatment of the communication between the embedded type movable terminal and a background server.
Owner:XIAMEN STELCOM INFORMATION & TECH

Method and apparatus for processing character streams

An embodiment is a method for processing a character stream including: (a) forming, responsive to the character stream, a plurality of parallel property bit streams wherein each of the parallel property bit streams includes bit values of a particular property associated with data values of the character stream; and (b) processing the parallel property bit streams. For example, the method applies to character streams encoded in accordance with fixed-width character encoding schemes, for example, ASCII, or variable length character encoding schemes, for example, UTF-8.
Owner:INT CHARACTERS INC

Method, apparatus and server for processing data packet

The present invention provides a method, an apparatus and a server for processing a data packet. The method comprises the steps of: generating XMPP (Extensible Messaging and Presence Protocol)-based message data to be transmitted, wherein the XMPP-based message data comprises a message header and a message body; performing self-defined packaging for the message header to obtain a packaged message header; converting the message body into a message body conforming to a preset protocol format; packaging the packaged message header and the message body conforming to the preset protocol format to obtain a new message packet; and transmitting the new message packet. According to the method, the apparatus and the server of the present invention, a communication mode of data protocol packets is adopted, an XMPP data packet is packaged by a protocol header, integrity of data is ensured through length field of the protocol header; in addition, communication quality of different types of data packets is defined, so that communication quality modes of different types of data packets can be ensured; and an original packet body is converted into UTF-8 byte stream for processing, thereby facilitating compression with a compression algorithm.
Owner:CHINA MOBILE GRP GUANGDONG CO LTD

UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method and device

The embodiment of the invention discloses a UTF-8 (8-bit Unicode transformation format) and ANSI (American national standards institute) code identification method used for identifying and distinguishing whether a file is in a UTF-8 coding mode or an ANSI coding mode. The condition that messy codes are displayed in the file in the process of parsing the file by using a wrong coding mode is avoided. The method in the embodiment of the invention comprises the following steps: S1, acquiring a data stream of the file; S2, storing the data stream in an array in a byte form; S3, judging whether preorder bytes exist in the array, if so, deleting the preorder bytes and executing a step S4, otherwise executing the step S4; S4, judging whether a first byte exists in the array, if so, deleting the first byte and executing a step S5, otherwise executing the step S5; S5, judging whether a second byte or a third byte exists in the array, if so, the coding mode of the file is ANSI, otherwise the coding mode of the file is UTF-8. The embodiment of the invention also provides a UTF-8 and ANSI code identification device.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Method and device of adding and connecting hidden Chinese wifi hotspot

The invention discloses a method and a device of adding and connecting a hidden Chinese wifi hotspot. According to the method, when the hidden Chinese wifi hotspot is added, wifi hotspot names are respectively stored in a wifi configuration file through utf-8 coding and GBK (Chinese Internal Code Specification) coding, and an utf-8 coding format is only employed in the existing technology before stored in the configuration file, the scheme needs operation of transcoding before storage, and the hotspot names in the two coding formats are stored in the configuration file after transcoding. When a terminal scans the wifi hotspot, each wifi hotspot name in the configuration file is packaged to a 802.11Proble Req frame in sequence and is transmitted, the wifi hotspot adopts GBK coding under the condition that the wifit hotspot is arranged to a Chinese name, so that the wifi hotspot can be successfully matched with an SSID (Service Set Identifier) transmitted by the terminal, thus the problem that an intelligent terminal is unable to add and connect the hidden Chinese wifi hotspot is finally solved.
Owner:GUANGDONG OPPO MOBILE TELECOMM CORP LTD

Method and Apparatus for XML Data Processing

Method and apparatus for at least one of coding or decoding of data. The method comprising retrieving Extensible Markup Language (“XML”)-Unicode Transformation Format 8 (“UTF-8”) data, confirming XML-UTF-8 data in a proper format converting a prolog located within said XML-UTF-8 data, initializing a tag and attribute lookup table, comparing a current character to a plurality of multi-character patterns, determining whether said current character can be converted to a multi-character pattern in said plurality and Unicode, converting said current character to one of ASCII and Unicode when said current character cannot be converted to said multi-character pattern in said plurality, comparing at least one subsequent character to said plurality of multi-character patterns to determine conversion of at least the current character when said current character can be converted more than one way, determining whether there are more characters.
Owner:TEXAS INSTR INC

An automatic archive classification method based on an extreme learning machine

The invention relates to an automatic archive classification method based on an extreme learning machine. The method comprises a learning stage and a running stage, in the first stage, a preprocessingmodule is needed, and the main function of the preprocessing module is to carry out standardized processing on data and remove information incoherent with a task; A preprocessing module firstly unifies text contents into utf-8 coding format; Filtering the illegal characters by adopting a regular expression matching mode; Carrying out word segmentation and part-of-speech tagging by adopting an ICTCLAS Chinese lexical analysis system; And finally, filtering words which often appear in the text but are not significant to text analysis by adopting a Baidu stop word table. According to the method,the archive content in the text can be accurately understood, an efficient and stable archive dictionary with low dimension can be constructed, and meanwhile high classification precision can be guaranteed.
Owner:福建南威软件有限公司

Method for supporting various character sets during application of EXCEL plug-in unit in WEB

The invention discloses a method for supporting various character sets during application of EXCEL plug-in unit in WEB, and belongs to a processing method of a character code set. According to the method disclosed by the invention, after a plug-in unit program receives data of a server, the format of transmission data is firstly converted to UTF-8 (Unicode transformation format-8) format, and the UTF-8 format is further converted to the format of a target code character set. Compared with the prior art, the method disclosed by the invention can improve the compatibility and the stability of an information system, can improve the user experience, and further has great popularization and application values.
Owner:INSPUR SOFTWARE CO LTD

Method and device for extracting Chinese names of people and places

ActiveCN105573981ASolve the shortcomings of high memory usage and slow speedReduce dirty dataNatural language data processingSpecial data processing applicationsUTF-8Chinese characters
The invention belongs to the field of natural language processing in computational linguistics, and particularly relates to a method and a device for extracting Chinese names of people and places. The method comprises the following steps: S1, transforming text into an encoding format of UTF-8 (unicode transformation format-8); S2, presetting a text threshold L, judging whether text length T is larger than the threshold L, adopting an extension paragraphing method to paragraph the text and turning to step S3 after paragraphing if T is larger than L, and turning to the step S3 if T is smaller than or equal to L; S3, preprocessing the text to remove dirty data; S4, performing part-of-speech tagging on separate Chinese characters in the preprocessed text, and performing word separation and combination on the tagged separate characters; S5, marking phrases in the text, which are matched with target phrases, and calculating matching results. The method and the device can be widely applied to recognition of named entities in fields of search engines, machine translation, data mining and the like.
Owner:XIAMEN MEIYA PICO INFORMATION

Method and system for coding two-dimensional code colors

The invention provides a method and a system for coding two-dimensional code colors. The method includes the steps: converting characters into a byte array corresponding to utf-8 codes of the characters by a two-dimensional code generator; determining color values corresponding to the characters according to a first preset policy; and drawing the color values in a two-dimensional code data area, forming a two-dimensional code picture and then transmitting the two-dimensional code picture to a calculator by the two-dimensional code generator. A decoder reads and analyzes the two-dimensional code picture to be processed, obtains the color value of the two-dimensional code picture to be processed and informs the calculator, wherein the two-dimensional code picture is stored in the calculator. The calculator inquires a color interval table, obtains the corresponding characters and transmits the corresponding characters to the decoder. The calculator is used for determining the color interval table according to a second preset policy according to the color values corresponding to the characters.
Owner:SUZHOU CODYY NETWORK SCI & TECH

Data encoding method

The present encoding method encodes binary data as sequences of code points occupying the Private Use Area of the Unicode Basic Multilingual Plane. The encoded data can be contained within a stream of UTF-8, UTF-16 or UTF-32 code units and subsequently decoded to yield the original binary data. This method requires minimal processing for both encoding and decoding operations, and yields a 75% storage efficiency limit. Each datum encoding sequence includes type and encoding length information, enhancing parse and search operation performance. The type system includes elements for creating complex structured data-text sequences, and a mechanism for application defined extensions.
Owner:JOYCE STEPHEN ALLYN

Coding method and system of two-dimension code

The invention provides a coding method and a system of a two-dimension code, the method comprises the steps of a client-side converting characters into a byte digital group of corresponding to character utf-8 code, confirming the corresponding color value of the characters in accordance with a first preset strategy, portraying the color value in the two-dimension code data region, forming a two-dimension code image and sending the image to a server, the server reading and analyzing the two-dimension code image under process, obtaining the color value of the two-dimension code image, inquiring a color section table, and obtaining the corresponding characters, wherein the server confirming the color section table on the basis of the corresponding color value of the characters and a in accordance with second preset strategy.
Owner:SUZHOU CODYY NETWORK SCI & TECH

Voice recognition character string processing comparison method based on Pinyin

The invention relates to a voice recognition character string processing comparison method based on Pinyin. For application of an existing voice recognition technology to certain special occasions ofperson name recognition, equipment name recognition and the like, errors are generated easily due to incorrect comparison. The method is "secondary processing" based on a general Chinese character recognition algorithm; and recognized Chinese character strings are converted into Pinyin strings, and then the Pinyin strings are compared with target Pinyin strings. The method comprises the followingsteps of 1, performing Pinyin coding: performing coding on all Chinese character Pinyin, wherein the coding is similar to coding of unicode; and enumerating all Chinese character Pinyin combinations;2, performing code conversion: converting the character strings, with coding modes of GBK, Unicode, UTF-8 and the like, for expressing Chinese characters converted into the Pinyin strings; and 3, performing polyphone processing: enumerating polyphones of all family names; performing special processing; and distributing the same Pinyin codes. According to the method, accurate recognition can be rapidly realized, so that misjudgment is avoided.
Owner:深圳市艾塔文化科技有限公司

Multilateral language analysis system and multilateral language analysis method for control script programs

ActiveCN106569939AOvercome garbled charactersOvercoming technical issues with \"?\"Hardware monitoringUTF-8Byte
The invention relates to the technical field of multilateral languages in program software, particularly to the technical problems existing in unusual multilateral language analysis and display, and discloses a multilateral language analysis method for control script programs. The method comprises the following steps of S200, executing the control script programs one by one; S300, capturing program lines of the script programs one by one, and recording to a TXT file; S400, storing the TXT file content to a first-level cache; S500, converting the TXT file content in the first-level cache into script programs in a byte format line by line; S600, converting bytes according to a UTF-8 format; and S700, converting the converted bytes into the script programs in a character format.
Owner:北京数科网维技术有限责任公司

A cellular communication system, network element and method of operating the same

A base station (101) for a cellular communication system is coupled to a fixed network (107) and receives text strings for transmission to remote units (103, 105). The base station includes a receive processor (111) which receives a first text string encoded in a first encoding format. The receive processor (111) is coupled to a conversion processor (113) which converts the first text string to a second text string at least partially encoded in a second encoding format. The first encoding format may be an UTF-8 or UTF-16 text encoding format and the second encoding format may be an ASCII text encoding format. The conversion processor (113) is coupled to a transceiver (115) which transmits the second text string to the remote units (103, 105).
Owner:MOTOROLA SOLUTIONS INC

Universal forum text extraction method

The invention relates to a universal forum text extraction method. The method comprises the following steps that a complete html code of a website is extracted, a webpage coded format is tested, and the webpage coded format is uniformly coded into a utf 8 format; a html label type is analyzed, a DOM tree of a webpage is obtained, title information and div label content containing publishing time information are extracted, and the extracted information is classified to generate a list after useless information is filtered; the data length of the list is calculated, and the information is classified with time as a mark and is output in a formatted mode. The extraction method is high in universality, can be applied to most forums, and can accurately extract corresponding data fields of main posts, replies, titles and posting time and output the corresponding data fields in a formatted mode, so that forum information is better utilized.
Owner:NORTHEASTERN UNIV

System and method for improved utf-8 encoding

InactiveUS20160099724A1Improved UTF- encodingImprove efficiencyCode conversionUTF-8Round complexity
The present invention is directed to a method, system, and computer program for improved Unicode encoding (UTF-8C). Specifically, the use of a numeric offset system is employed to reduce coding complexity and to mitigate errors in decoding, as compared to standard UTF-8 encoding. Further, a non-zero null string filter may be used to improve the convenience of internalizing C-strings.
Owner:DOSSEV IVAN

Systems and methods of UTF-8 pattern matching

Systems and methods are described for efficiently processing, searching and / or rewriting variable width encoded data, such as UTF-8 encoded data, will be described. Embodiments of the systems and methods modify and adapt search algorithms, such as the Horspool and Wu-Manber algorithms, to efficiently process and manage searching of variable width encoded text in large blocks of text, such as text that may be carried via a stream of packets thru a network device, such as an intermediary device.
Owner:CITRIX SYST INC

Migration support device

This work PC 1 (migration support device) includes: a character code converting unit (11) that converts an EBCDIK+KEIS code into a UTF-8 code; a program converting unit (12) that converts an input program (22) into an output program (32); an exchange information generating unit (13) which causes the output program (32) to read character data to which the UTF-8 code is allocated, and which generates exchange information E that defines, regarding the read character data, the number of areas on a memory specified by the output program (32) so as to be the same as the number of bytes of a byte sequence expressing the EBCDIK+KEIS code allocated to the character data; and an area storing unit (14) that stores a UTF-8 code allocated to the read character data in the areas the number of which is defined by the exchange information E.
Owner:HITACHI SOCIAL INFORMATION SERVICES LTD

Two-dimensional code processing method and two-dimensional code client-side

The invention provides a two-dimension code processing method and a two-dimensional code client-side. The method comprises the following steps: the two-dimensional code client-side converts characters into byte digit groups corresponding to unicode transformation format-8 (utf-8) coding of the characters, corresponding color values of the characters are determined according to a first preset strategy, and the color values are drawn in a two-dimensional code data area to form two-dimensional code pictures. The two-dimensional code client-side reads and analyzes the two-dimensional code pictures to be processed, obtains the color values of the two-dimensional code pictures to be processed and inquires a color interval list to obtain corresponding characters, wherein the two-dimensional code client-side determines the color interval list according to the color values corresponding to the characters and a second preset strategy.
Owner:SUZHOU CODYY NETWORK SCI & TECH

Data exporting method and device and electronic equipment

The present invention provides a data derivation method, apparatus, and electronic device. If the target export file includes both UTF-8 encoded format data and non-UTF-8 encoded format data, the target export file is split into multiple A record that converts all records in the non-UTF-8 encoded format of the record into a target record in a UTF-8 encoded format to obtain an exported file in UTF-8 encoded format. Through the invention, the record in the non-UTF-8 encoding format in the target export file can be converted into the target record in the UTF-8 encoding format, thereby avoiding garbled characters.
Owner:AGRICULTURAL BANK OF CHINA

Method and computer device for obtaining front-end html design table based on execl template

The invention provides a method and a computer device for obtaining a front-end html design table based on an execl template, comprising the following steps: step 10, importing an execl form file; 20,open a file storage space and an execl storage space in that storage space; 30, read an execl form file to obtain a file data stream; Storing a file data stream in the file storage space; Step 40, extracting the form data stream in the file data stream and storing the form data stream in the execl storage space; 50, convert that data stream of the form into an html file; 60, through UTF-8, converting an html file into an html data stream; Step 70, extracting the data in the html data stream by extracting the regular expression, and then converting and cleaning the error code and the random code in the data by de-scrambling the regular expression, and finally obtaining the source code of the html form.
Owner:FUJIAN CENTM INFORMATION

A cellular communication system, network element and method of operating the same

A base station (101) for a cellular communication system is coupled to a fixed network (107) and receives text strings for transmission to remote units (103, 105). The base station includes a receive processor (111) which receives a first text string encoded in a first encoding format. The receive processor (111) is coupled to a conversion processor (113) which converts the first text string to asecond text string at least partially encoded in a second encoding format. The first encoding format may be an UTF-8 or UTF-16 text encoding format and the second encoding format may be an ASCII textencoding format. The conversion processor (113) is coupled to a transceiver (115) which transmits the second text string to the remote units (103, 105).
Owner:MOTOROLA SOLUTIONS INC

Unicode-compatible entropy coding

A character data set is compressed with a compression algorithm module of a computer system to generate one or more streams of encoded values. The compression module is configured to compress the character data set with an entropy encoder to generate one or more streams of encoded values with UTF-8 or UTF-16. A code points mapper assigns the encoded values to code points in a Unicode format. A UTF encoder encodes the streams of assigned encoded values.
Owner:RED HAT

Fast Implementation Of Decoding Function For Variable Length Encoding

An embodiment of the present inventions is a method for encoding / decoding data of variable length format and is used to omit unnecessary pieces of data for the purpose of improving processing performance, reducing the size of data on communication paths and efficiently using limited physical memory. As examples of such variable length encoding, BER compression and UTF-8 encoding of UNICODE text, etc., are cited. While the amount of data can be reduced through encoding, before the data is actually used, it is necessary to restore (decode) it to the original data, which requires a great deal of processing time. One aspect of the present invention is improving decoding by reducing the processing time required to decode the encoded data.
Owner:SAP AG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products