Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

63 results about "Web scraping" patented technology

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

System and method for presenting and inputting information on a mobile device

InactiveUS20070250711A1Perceived latencyProviding user controlUser identity/authority verificationInformation formatSession managementWeb service
Disclosed are combinations of authentication, session management and web scraping implemented on a mobile device to support a rich mobile application using secure connections to existing websites to access data sources. The mobile application presents information in logical units rather than screen by screen, and fetches data in the background for low perceived delay. The mobile application provides consistent navigation using the 12-key or QWERTY keypad. The mobile application maintains a history of screens, allowing the user to easily return to a prior screen. A web server allows phrases to be configured on-line by an individual user and downloaded to that user's mobile device to simplify data entry on the mobile device. A method of embedding user profile information in a signed application executable file that allows applications to be pre-configured per user. A licensing mechanism that supports multiple distribution channels.
Owner:PHONIFIED

Web page collecting method and web page collecting server

The invention discloses a web page snatching method and web page snatching server. The method comprises: A. the method receives web page request; B. the method estimates whether the requested web page is snatched, executes step C if yes, otherwise, snatches the web page and ends the flow; C. the invention estimates whether the snatching time of the requested web page is bigger than the presetting time threshold value and executes step D if yes,. Otherwise, does not snatch the web page and end the flow; D. the invention searches whether the web page is updated and snatches the web page if yes, otherwise does not snatch the web page. The server comprises: a web page request receiving module, an estimation module, a searching module and a snatching module. The invention can lighten the burden for the web page snatching server, reduces the occupation to the network band width material and elevates the efficiency of the web page snatching.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Concierge robot system, concierge service method, and concierge robot

A concierge robot system, a concierge service method, and a concierge robot are provided. The system provides an antificial intelligence type of concierge service, and includes: a user interface device that receives an external image and an external voice, and outputs the received image or voice on a screen or by voice; a storage device where a program that provides data through the user interface device based on learning data generated by using a neural network model is stored; and a processor that executes the program, wherein the program includes instructions for recognizing an emotion of a user, identified from the external image based on the learning data, outputting data that represents an emotion according to the emotion recognition to the screen, generating a conversation sentence that corresponds to natural language of web data externally collected through web scraping based on the learning data and outputting it by voice, generating user recommendation data for the identified user based on the learning data, and outputting the user recommendation data on the screen or processing it into natural language and outputting a corresponding conversation sentence by voice.
Owner:ROBORUS CO LTD

Interactive web crawling

A crawler that is either based on an interactive mode of operation or includes an interactive mode along with one or more other modes, such as automatic or manual. Similar to an automatic mode crawler, the crawler traverses web sites, web content and links. However, if the crawler encounters a structure that requires human interaction, such as a form, a radio button selector, a drop down selector, a human verification test, etc., the crawler pauses and prompts a user to take action.
Owner:HEWLETT PACKARD DEV CO LP

Method and system for grabbing web pages from servers with different IPs (Internet Protocols) in website

The invention discloses a method and a system for grabbing web pages from servers with different IPs (Internet Protocols) in a website. The method comprises the following steps of: distributing the IPs of target website servers for the web-page grabbing task of a client side, wherein the web-page grabbing task comprises grabbing of the addresses of the web pages to be grabbed; and then judging whether the web-page grabbing task conforms to the courteous access condition of the servers or not, if SO, utilizing the IP to establish connection with the servers, and grabbing the web pages at the web-page addresses from the servers. In the invention, the access strategy is based on IP level, so that an acquisition working thread is more conveniently controlled to carry out courteous access on the website; by the mode of caching a DNS (Domain Name Server), simultaneously using a plurality of IPs and preferably distributing the fastest IP, the efficiency for grabbing the web pages is greatly improved; and when individual servers of the target website can not be accessed, the servers with other IPs can be switched in time, and the fault-tolerant capability is improved.
Owner:NEW FOUNDER HLDG DEV LLC +2

Automatic form filling

A user, such as a merchant, populates a form, such as an on-boarding form from a payment provider, with information unique to the user. In one embodiment, the user enters a URL address. The information is communicated to the payment provider, who then obtains additional information about the user based on the information. In one embodiment, web scraping results in a name, address, a phone number, and / or an email address for the user. The form is populated with this information by the payment provider and returned to the user for confirmation or correction. As a result, the form filling is easier for the user.
Owner:PAYPAL INC

Creation of data extraction rules to facilitate web scraping of unstructured data from web pages

The present invention provides a method, system, and computer program to help a user without any programming knowledge create data extraction rules for collecting data from websites at scale. A user only needs to provide a web page Universal Resource Locator (URL), then mark and assign the needed data to its type. For example, on an e-commerce website, this data can be the product name, price, description, and so forth. Marking is done by highlighting the correct part of the web page. This creates a data extraction rule that describes the web template of full website and can be used thereafter for automated web scraping from all pages on a particular website.
Owner:PROFITERO

System and Method for Automatically Extracting and Analyzing Data

A system and computer-implemented method for automatically extracting and analyzing data from one or more data sources is provided. The system comprises a platform manager configured to provide options for configuring rules for data extraction. The system further comprises a web scraping and crawling module configured to extract data from one or more data sources by executing one or more data extraction jobs using the configured rules. Furthermore, the system comprises an information extraction engine configured to analyze the extracted data by performing one or more analytical operations, decipher the analyzed data using pre-stored vocabularies and classify the deciphered data. The information extraction engine further configured to convert at least one of: the analyzed data, the deciphered data and the classified data to one or more formats for use by at least one of: one or more enterprise applications, enterprise portals and one or more communication channels.
Owner:COGNIZANT TECH SOLUTIONS INDIA PVT

Webpage screening method and device thereof

The invention discloses a webpage screening method and a webpage screening device. The method comprises that preset seed webpage is captured; uniform resource locator (URL) information included by the seed webpage is captured; webpage mass fraction corresponding to the URL information is calculated; the URL information is divided into corresponding candidate gather according to preset network address information; the URL information which is not greater than the preset pressure quota is screened out from each candidate gather, the URL information which is screened out and corresponding to the webpage mass fraction which is not lower than the webpage mass fraction and corresponding to arbitrary residual URL information in the relative candidate gather is screened out. The captured pressure value corresponding to the network address is ensured based on the preset pressure quota. The webpage corresponding to the URL information which is screened out is regarded as the target captured webpage. The method lowers the risk of the capturing webpage failure or the risk of banning site so that the goal of improving the success rate of capturing the webpage is achieved.
Owner:人民数据管理(北京)有限公司

Web page crawling method and spider

The invention discloses a web page crawling method and a spider. The method comprises the following steps: injecting seed URL into a Web database; generating a URL list based on the Web database; feeding back the URL in the URL list to a web page crawler; crawling the webpage by the web page crawler according to the fed back URL comforming to the corresponding visit mode of the web page; and updating the URL state in the Web database and injecting newly found URL based on the crawled web page, wherein the visit mode comprises requesting parameter socket, responsing parameter socket, requesting the corresponding relationship between the requesting parameter socket and the responsing parameter socket; the requesting parameter socket comprises requesting parameter, as well as the mapping relationship of the requesting parameter socket and the responsing parameter socket; the responsing parameter socket comprises a responsing parameter, as well as the extraction position information about the extraction position of the responsing parameter in http respongsing message.
Owner:FUJITSU LTD

Web page collecting method and device as well as browser

The invention is suitable for the technical field of computers and provides a web page collecting method and device as well as a browser. The web page collecting method comprises the following steps of receiving a web page collecting command, and obtaining a web page link corresponding to a web page; calling a web page grabbing server in a cloud server farm to grab web page content corresponding to the web page link according to the web page link; saving the web page content in a cloud storage server in the cloud server farm. According to the web page collecting method and device as well as the browser, which are disclosed by the invention, the cloud storage of the collected web page content is realized, and the long-term validity of the collected web page is ensured, so that the collected web page content is not limited by time and access addresses, and the function of a bookmark of the browser is expanded.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Method for automatically finding network content quotation

ActiveCN1770159ASpeed ​​up the auto-discovery processLow hardware requirementSpecial data processing applicationsInformation retrievalNatural language understanding
The invention relates to a method for finding network contents being quoted automatically which comprises steps of: introducing pre-searching process for accelerating automatic found process, employing the indexing service provided by searching engine website to eliminate web page grabs and establishing content index. The invention has the advantages of having low requirement on hardware and of being abet to protect intelligent property of network contents.
Owner:新方正控股发展有限责任公司 +2

Webpage data analyzing and processing method

The invention discloses a webpage data analyzing and processing method which is implemented on the basis of a webpage data service platform. The webpage data service platform comprises a client, a content server and a word segmentation cloud server, and a webpage capturing system, a content extraction system, a content analyzing system and a database are installed on the content server. The method specifically includes the steps of S1, webpage capturing; S2, content extracting; S3, Chinese word segmentation; S4, content analyzing; S5, result displaying, namely the client calls a data result form the database and displays to users. By the adoption of the reading habit based webpage content extraction technology, subject contents of a webpage can be recognized rapidly and extracted, Chinese word segmentation is effectively performed by adopting the cloud segmentation technology, fundamental guarantee is provided for big-data analysis, investment of software and hardware resources by the users is not needed, and requirements on low cost and orientation of big-data analysis service of small and medium-sized enterprises and ordinary individual users can be met.
Owner:ZHANGZHOU COLLEGE OF SCI & TECH

Concierge robot system, concierge service method, and concierge robot

A concierge robot system, a concierge service method, and a concierge robot are provided. The system provides an artificial intelligence type of concierge service, and includes: a user interface device that receives an external image and an external voice, and outputs the received image or voice on a screen or by voice; a storage device where a program that provides data through the user interface device based on learning data generated by using a neural network model is stored; and a processor that executes the program, wherein the program includes instructions for recognizing an emotion of a user, identified from the external image based on the learning data, outputting data that represents an emotion according to the emotion recognition to the screen, generating a conversation sentence that corresponds to natural language of web data externally collected through web scraping based on the learning data and outputting it by voice, generating user recommendation data for the identified user based on the learning data, and outputting the user recommendation data on the screen or processing it into natural language and outputting a corresponding conversation sentence by voice.
Owner:ROBORUS CO LTD

Internet simulation browser-based method for acquiring data in credit investigation system

The invention provides an internet simulation browser-based method for acquiring data in a credit investigation system. Credit data of a customer to be approved are acquired from an external system, and the credit data are used for consultation of credit card investigation personnel. The method comprises the following steps of: A, analysis simulation of a target resource site; B, web page capturing: acquiring a result page to be queried and returned from the target resource site, and storing the result page in a local computer; C, information extraction: extracting the required information from the result page; and D, data storage: storing the extracted required information record in an information record database in a credit card credit investigation unit. By adopting the method, the credit data of the customer to be approved can be conveniently acquired from the external system, and the efficiency of credit card credit investigation is remarkably improved.
Owner:CHINA CONSTRUCTION BANK

HTML template-based method, equipment and system for releasing graphics and text information via television

InactiveCN105007539AImprove content production efficiencyImprove production efficiencySelective content distributionGraphicsData scraping
The application discloses an HTML template-based method for releasing graphics and text information via a television. The method comprises the steps as follows: setting a template file which is suitable for HTML / xHTML format played by the television; wherein the template file comprises a display frame and a formal parameter, and the display frame defines a display mode of metadata; the formal parameter is embedded in a specific position of the display frame; using a data capturing technology to capture the metadata from a webpage and storing the metadata in a database; extracting the metadata from the database and filling in the template file to replace the formal parameter, and rendering and generating a graphics and text page; converting the graphics and text page into a picture format; converting the graphics and text page in picture format into a PAL video signal of a television program channel, and playing on a television screen. The application further discloses an HTML template-based system for releasing the graphics and text information via the television, and a content releasing server.
Owner:孙巍

Method and device for reading webpage resources, and electronic equipment

The embodiment of the invention discloses a method and a device for reading webpage resources, and electronic equipment. The method is applied to the webview controls of an Android operating system of 4.0 to 4.3 versions. The method comprises the following steps: if the loading state of webpage resources to be fetched is loading completion, obtaining the URL (Uniform Resource Locator) information of the webpage resources to be fetched, wherein the webpage resources to be fetched correspond to an obtained webpage fetching request; according to the package name of an application program which constructs the current webpage, obtaining a resource cache file path mapped by the package name; extracting a binary data file under the resource cache file path, and traversing the binary data file to obtain an information field matched with the URL information; and inquiring information before the matched information field, obtaining preset symbolic information, obtaining a webpage resource file corresponding to the URL information according to the information before the symbolic information and a filename calculation strategy, and reading the webpage resource file under the resource cache file path. The method and the device can be applied to improve web resource utilization efficiency.
Owner:KINGSOFT

Web crawling method and server

A web page grabbing method is provided. A target web page on a website is grabbed, the target web page including a web page corresponding to a Hypertext Markup Language 5 (H5) content and a web page corresponding to a non-H5 content. The web page corresponding to the H5 content is detected according to web page source code of the target web page. Dynamic rendering is performed on the web page corresponding to the H5 content, to obtain a rendered web page. Content details information corresponding to the H5 content is extracted from the rendered web page.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage

The invention discloses a filtering expression and rendering engine based method for automatically monitoring update of a dynamic webpage. A user appoints an interested part in the webpage as a concerned point through a visualized interface, and an application or a client automatically generates a filtering expression corresponding to the concerned point; a server renders the dynamic webpage by utilizing the rendering engine to obtain the same page seen by the user, and extracts the concerned point of the user; and when the concerned point of the user is updated, the server pushes the update content to the user in time. According to the method, a customizable dynamic webpage monitoring program is realized by helping the user appoint the concerned point and utilize the rendering engine for automatically inspecting webpage update at the server, the problem of lack of customization for a conventional information subscription mode (such as RSS (really simple syndication)) is solved, the defect of incapability of analyzing the dynamic webpage in conventional webpage capture is also overcome, and the efficiency for obtaining webpage information update by the user is improved.
Owner:SOUTHEAST UNIV

Policy big data mining method and device, computer equipment and storage medium

The embodiment of the invention belongs to the technical field of artificial intelligence, and relates to a policy big data mining method and device, computer equipment and a storage medium. In addition, the invention also relates to a blockchain technology, and the target policy data of the user can be stored in the blockchain. According to the application, data capture operation is performed onthe policy object URLs in the policy object list through the webpage capturer to obtain the initial policy data, and the initial policy data is automatically integrated to obtain the target policy data convenient for workers to check, in the whole implementation process, related information can be obtained without manually querying policy data, the problems of large manpower resource consumption and high manpower cost are effectively solved, meanwhile, the accuracy of information extraction can be ensured, and the data extraction rate is increased.
Owner:广州博士信息技术研究院有限公司

Web page collecting method and web page collecting server

The invention discloses a web page snatching method and web page snatching server. The method comprises: A. the method receives web page request; B. the method estimates whether the requested web page is snatched, executes step C if yes, otherwise, snatches the web page and ends the flow; C. the invention estimates whether the snatching time of the requested web page is bigger than the presetting time threshold value and executes step D if yes,. Otherwise, does not snatch the web page and end the flow; D. the invention searches whether the web page is updated and snatches the web page if yes, otherwise does not snatch the web page. The server comprises: a web page request receiving module, an estimation module, a searching module and a snatching module. The invention can lighten the burden for the web page snatching server, reduces the occupation to the network band width material and elevates the efficiency of the web page snatching.
Owner:TENCENT TECH (SHENZHEN) CO LTD

System and method for recognizing posture of single pig body based on stacked hourglass network

The invention provides a system and method for recognizing the posture of single pig body based on a stacked hourglass network. The method comprises the steps: acquiring image data and video data on the spot through photographing equipment, acquiring the image data through a web scraping technology, learning a preprocessed data set through the training of an improved stacked hourglass network model, automatically acquiring a skeleton diagram of a single pig in a pigsty within a shooting controllable range, and determining the category of the pig according to obtained pig body posture data; andwhen a certain posture of the pig is abnormal, warning and prompting a breeder to enter the pigsty to further check the health condition of the specific pig. With the system and the method, pigs in afarm can be effectively managed, and the pressure of technicians such as feeders is relieved to a certain extent.
Owner:CHINA AGRI UNIV

Direct leg access for proxy web scraping

Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
Owner:OXYLABS UAB

Web page grabbing method and web page grabbing system based on big data

The invention provides a web page grabbing method and a web page grabbing system based on big data. The web page grabbing method comprises the following steps of receiving a web page request of a user; classifying the big data according to key word classification of the web page; and transmitting the classified web page which corresponds with the web page request to the user. The web page grabbingmethod and the web page grabbing system provided by the invention have an advantage of high convenience in grabbing the web page.
Owner:SHENZHEN BOXINNUODA ECONOMIC RELATIONS & TRADE CONSULTANTS CO LTD

Data crawling implementation method based on distributed crawler technology

The invention discloses a data crawling implementation method based on a distributed crawler technology, which relates to the technical field of data crawling, and comprises the following steps of S1,appointing a URL, finding an address which finally needs to find grabbed data according to a given address, and obtaining a corresponding index code, and S2, initiating a request, splicing websites according to the codes obtained in the step S1, judging whether the websites are required captured data or not, if the websites are data pages, calling back to detail pages, and otherwise, continuing to search the data websites circularly. According to the method, focused capture and script are adopted, regularization, json data and a plurality of data with data frequencies of year, month, quarterand the like are used in the capture process, incremental updating and batch insertion into an Oracle database are carried out, contents are processed and screened when webpage capture is carried out,only webpage information related to requirements is captured as much as possible, and whether a character string is matched with a certain mode or not can be conveniently checked.
Owner:安徽经邦软件技术有限公司

Systems and methods for intelligent prospect identification using online resources and neural network processing to classify organizations based on published materials

In an illustrative embodiment, methods and systems for identifying prospective new clients based upon review of current clients include accessing a book of business of a user of a transactional platform to identify current clients, identifying, key terms relevant to each of the clients, automatically performing Internet searches, each search using different groupings of the key terms, automatically deriving from web sites of the search results information regarding a number of prospects, and presenting prospect information to the user. The key terms may be identified through performing web searches and web scraping of online information related to the current clients. The clients may be categorized based upon organizational attributes prior to searching. An initial set or sets of key termsmay be filtered through performing a number of data analyses on the key terms.
Owner:AON GLOBAL OPERATIONS LTD SINGAPORE BRANCH

Dynamic webpage crawling method and device

The invention discloses a dynamic webpage crawling method. The method comprises following steps: arranging at least two queues, crawling url of web-pages to be crawled and priorities, storing them into at least two queues and scheduling according to priorities of url stored in at least two queues; receiving elements of at least two queues called in order to obtain url of elements to be analyzed; and obtaining webpage content by analyzing url of queue elements. The dynamic webpage crawling method has following beneficial effects: procedures for crawling analyses and url of a link library can be scheduled simultaneously according to priorities so that webpages of higher priorities can be crawled firstly; by scheduling at least two queues, de-queuing efficiency and en-queuing efficiency of webpages can be improved; the time complexity is logN so that webpage crawling efficiency can be greatly improved.
Owner:DATAGRAND TECH INC

Cascade crawling method and device for multi-level pages based on web crawlers

The invention relates to a cascade crawling method for multi-level pages based on web crawlers. The method comprises: grabbing an upper-level page, storing grabbed data in an upper-level page data analysis table, and setting main key values for objects needing to continue to grab a lower-level page in the upper-level page data analysis table, wherein the main key values corresponding to the objects are different; grabbing a subordinate page and storing the captured data in a subordinate page data analysis table; setting a foreign key value for the lower-level page data analysis table, obtaining a main key value of an object corresponding to a lower-level page from an upper-level page data analysis table, and taking the main key value as the foreign key value of the lower-level page data analysis table, thereby realizing associated query of an upper-level webpage and a lower-level webpage after grabbed data falls to the ground. According to the method, a data acquisition mode capable ofrestoring logics before and after the webpage is provided, the webpage capture integrity is ensured, the data is stored according to the original webpage hierarchy sequence, and the associated multi-hierarchy page data can be conveniently obtained.
Owner:厦门商集网络科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products