Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

59 results about "Web extraction" patented technology

Managing and indexing content on a network with image bookmarks and digital watermarks

A method of managing content, and in particular, managing content on the Internet retrieves a web page that includes an image and detects whether the image included within the web page is embedded with a digital watermark. It generates an indicia associated with an image included in the web page that is embedded with a digital watermark. The indicia indicate to the user which images include watermarks. The watermarks may be used to convey links to related web pages or specific information about the images, such as usage rights and licensing information. Variations of this method create image bookmarks to web pages including images using thumbnails of those images. A content management system comprises a first program for retrieving web pages including images. It also includes a second program for extracting an image from a web page, creating a thumbnail of the image, and forming an image bookmark linking the thumbnail to the web page that the image has been extracted from. The thumbnails are used to create a visual index to corresponding web pages from which the images originated on the Internet. A method of visual indexing of content on a network, such as the Internet, retrieves a web page, extracts an image included on the web page, generates a thumbnail of the image, and creates a link between the thumbnail and a location of the web page from which the image has been extracted.
Owner:DIGIMARC CORP

Web browser embedded button for structured data extraction and sharing via a social network

The present invention is directed to a system and method which users can use to identify data base elements in a web page, store the extraction template representing the location and type of elements on the page, extract and store the product record in their collection, use the extraction template to automatically extract all the data from the web site and constantly check the extraction templates for correctness and update the extraction templates if necessary. Additionally, the present invention system provides crowd sourced web page data record extraction template creation to build a database of web page extraction templates which could then be used by others to extract the information from the web pages at the site where the extraction template(s) were created, and to save the information to a social network. Moreover, crowd based web page data record extraction template creation and storage system can be used to create extraction templates for batch extraction of information from remote web sites. Also, the data record information extracted from the web page to find the same or similar products at other web sites can be sited in a central product record data base that is created with the previously mentioned batch extraction system.
Owner:PAPPAS DEREK EDWIN +1

Method for Extracting Data from Web Pages

Embodiments of the invention describe a computer-implemented method for extracting data from web pages. During a learning stage, the embodiments receive a template web page represented by a template Document Object Model (DOM) and select a record node, which is a root node of a sub-tree of the template DOM that contains data to be extracted. After that, a record node sub-tree and data field sub-paths are stored in a memory, wherein the record node is a root node of the record node sub-tree, and the data field sub-paths are relative paths of the template DOM from the record node to data field nodes. During the extraction stage, a web page represented by a DOM-tree is received and a matched sub-tree of the DOM-tree according to a structure of the record node sub-tree is identified. Next, data from the matched sub-tree according to the data field sub-paths are extracted.
Owner:MITSUBISHI ELECTRIC RES LAB INC

Managing and indexing content on a network with image bookmarks and digital watermarks

A method of managing content, and in particular, managing content on the Internet retrieves a web page that includes an image and detects whether the image included within the web page is embedded with a digital watermark. It generates an indicia associated with an image included in the web page that is embedded with a digital watermark. The indicia indicate to the user which images include watermarks. The watermarks may be used to convey links to related web pages or specific information about the images, such as usage rights and licensing information. Variations of this method create image bookmarks to web pages including images using thumbnails of those images. A content management system comprises a first program for retrieving web pages including images. It also includes a second program for extracting an image from a web page, creating a thumbnail of the image, and forming an image bookmark linking the thumbnail to the web page that the image has been extracted from. The thumbnails are used to create a visual index to corresponding web pages from which the images originated on the Internet. A method of visual indexing of content on a network, such as the Internet, retrieves a web page, extracts an image included on the web page, generates a thumbnail of the image, and creates a link between the thumbnail and a location of the web page from which the image has been extracted.
Owner:DIGIMARC CORP

An image search method and its search engine

InactiveCN102270234AImplement the extraction functionAchieve a specific effectSpecial data processing applicationsImage segmentationImaging Feature
The invention provides an image search method and a search engine thereof. The method obtains similar images by crawling the obtained pictures on the whole network, extracts the context and subject information of the pictures from the source webpage of the similar images, and finally provides the search results of the images comprehensively according to the semantic features and visual features of the images . The image search engine includes an acquisition module, a primary search module, a secondary search module, a word segmentation module and a determination module. The acquisition module obtains the source image, the primary search module obtains a similar image set, and the secondary search module establishes a data structure of web page information of the similar image set. The word segmentation module marks the position weight of the picture context, extracts the longest phrase and marks the word weight, and the determination module extracts the core subject words and crawls relevant picture information. The present invention provides a search engine and search method that more comprehensively uses images to search subject information and related images, and users can generate different needs according to different scenarios and achieve specific effects.
Owner:BEIHANG UNIV

Method and system for extracting Web information based on Nutch

The invention discloses a system for extracting Web information based on Nutch. The system comprises an information extraction module, a storage module, an index module and a retrieval module, wherein the information extraction module is used for capturing webpage data from the Internet through a Nutch frame and analyzing the data; the storage module is used for storing webpage extraction files in which the webpage data is filtered; the index module is used for transmitting the webpage information collected by the Nutch to Solr to establish an index; the retrieval module is used for using the Solr to respond to a user query request and displaying the query result to a user in an XML page form. The response and running sped, stability and expandability of information extraction are improved, the excessive storage space occupied by the program is reduced, and guarantees are provided for the fact that the user can obtain effective information in time.
Owner:NANTONG UNIVERSITY

Webpage extraction method based on attribute reproduction and labeled path

InactiveCN102760150AEfficiently determine the label pathSpecial data processing applicationsWeb extractionText string
The invention discloses a webpage extraction method based on attribute reproduction and labeled path. The web extraction method comprises the following steps of: constructing an attribute value seed set through extracting a target website or an attribute value list page, wherein part value of a target attribute is contained; acquiring a partial sample page, and determining a relative labeled path, between an attribute name and an attribute value, of each attribute; downloading a partial page, constructing a training sample base, and storing the acquired codes in a local database; inquiring and labeling all reproductions of each seed attribute value in the training webpage, recording to the labeled path corresponding to each reproduction; taking the labeled path with highest support to a same attribute webpage as an extraction rule for extracting other webpage information except the training samples; accessing other webpage HTML (Hypertext Markup Language) trees in the target website by using the acquired labeled path, locating the label where the attribute value is, and extracting a text character string; and deleting the attribute value without the attribute name or with an incorrect attribute name, and storing the correct attribute value into the local database, thereby finishing the attribute value extraction of page attribute.
Owner:NAT UNIV OF DEFENSE TECH

Seat belt retractor for the safety belt of a motor vehicle

A seat belt retractor for a safety belt of a motor vehicle having a belt shaft (1) rotatably mounted in a frame, a profile head (2) lockable in relation to the frame, a load limiting device (20) located between the profile head (2) and the belt shaft (1) for enabling the belt shaft (1) to undergo a load limited rotation in the belt webbing extraction direction (A) with the profile head (2) being locked and a load limiting level predetermined by the load limiting device (20) being exceeded. the load limiting device (20) is formed from at least two load limiting elements (4, 5), by the activation of which the load limiting level can be switched from a lower to a higher level during the load limited belt webbing extraction, a first load limiting element (4) with a higher load limiting level is provided, which load limiting element with a first end (4a) is connected to the profile head (2), and a second load limiting element (5) with a lower load limiting level is provided, which load limiting element with a second end (5b) is connected to the belt shaft (1), and the second end (4b) of the load limiting element (4) with the higher load limiting level is connected to the first end (5a) of the load limiting element (5) with the lower load limiting level via a connecting element (6), and a coupling element (7) is provided for coupling the connecting element (6) to the belt shaft (1), via which coupling element the connecting element (6) can be coupled to the belt shaft (1) after the same has performed a rotation of a predetermined angle.
Owner:AUTOLIV DEV AB

Storm based stream computing frame text index method and system

The invention discloses a Storm based stream computing frame text index method and system. The Storm based stream computing frame text index method includes: implementing topology of a storm, designing a Storm real-time data processing frame, and completing a webpage automatic extraction program of a web spider; automatically extracting key words; and classifying texts: classifying texts into one or more classifications according to content or attributes of the texts. The method and the system can allow backup data back in the conditions of data corruption or data loss, and recovers data; the function of centralized operation and maintenance of the system can be provided; an interface is beautiful and practical, and a convenient and visual graphical user management interface is achieved; function expansion can meet the demands of a user for later system expansion and use range expansion; and fault tolerance allows the system to have a certain fault tolerance mechanism when illegal data generates due to user input or wrong operations.
Owner:YUNNAN UNIV +2

Apparatus and method for sharing web contents using inspector script

An apparatus for sharing Web contents is provided. The apparatus includes a Web browser that loads and outputs a Web page, and a Web content transmission client that is linked with the Web browser to extract context information that is current state information from the Web page, and transmits the extracted context information to at least one other terminal.
Owner:ELECTRONICS & TELECOMM RES INST

Method of printing web page by using mobile terminal and mobile terminal for performing the method

A method of printing a web page by using a mobile terminal and a mobile terminal are provided. The method includes displaying the web page on the mobile terminal, extracting objects that are to be printed from the web page displayed on the mobile terminal, setting a layout of the extracted objects, and generating printing data by rendering the objects according to the layout.
Owner:HEWLETT PACKARD DEV CO LP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products