Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Impala cluster log analysis method and system based on Hadoop

A Hadoop cluster and analysis method technology, applied in the field of log analysis based on Hadoop cluster in Impala, can solve the problems of low Map/Reduce program model, difficult program maintenance and reuse, and low Map/Reduce program efficiency, so as to improve the efficiency of data processing. Effect

Active Publication Date: 2016-10-12
YONYOU NETWORK TECH
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, Hadoop's Map / Reduce program model is at a relatively low level, developers need to develop client programs, and these programs are often difficult to maintain and reuse, and running Map / Reduce programs is inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Impala cluster log analysis method and system based on Hadoop
  • Impala cluster log analysis method and system based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0027] An Impala-based Hadoop cluster log analysis method, including,

[0028] Set the web server to generate a new directory every day, and generate multiple log files generated by the Application business system under the directory;

[0029] Set the system timer CRON, regularly import the log files generated the previous day to HDFS in Hadoop, and load the log file data into hive;

[0030] After the hive data is loaded, set the system timer CRON again, regularly update hive metadata, start the Impala query program, extract hive metadata and calculate statistical indicators;

[0031] After the above calculation and statistics are completed, set the system timer CR...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an Impala cluster log analysis method and system based on Hadoop. The Impala cluster log analysis method based on Hadoop includes the steps that a web server is arranged to generate a new catalogue every day, and multiple log files generated by an Application service system are generated below the catalogues; a system timer CRON is set, log files generated in the day before are input into an HDFS in Hadoop periodically, and log file data is loaded into hive; after hive data is loaded, the system timer CRON is set again, hive metadata is updated periodically, an Impala query program is started, the hive metadata is extracted, and statistic indexes are calculated; after computational statistics, the system timer CRON is set again, statistic index data is output from the HDFS periodically to a database, and subsequent query is convenient. The method and system have the advantage of improving data processing efficiency.

Description

technical field [0001] The present invention relates to the Internet field, in particular to an Impala-based Hadoop cluster log analysis method and system. Background technique [0002] The popularity of the Internet makes the web the largest information system in today's highly informationized society. Among them, the web log contains a large amount of user access information, and the web log contains the most important information of the website. Through log analysis, we can know the number of visits to the website, which webpage has the most visitors, which webpage is the most valuable, etc. Generally, a medium-sized website (above 10W PV) will generate more than 1G of web log files every day. A large or very large website may generate 10G of data per hour. [0003] However, Hadoop's Map / Reduce program model is at a relatively low level, and developers need to develop client programs, which are often difficult to maintain and reuse, and the efficiency of running Map / Red...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/1815G06F16/183
Inventor 肖松林
Owner YONYOU NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products