Cluster fault detection method and device

A fault detection and clustering technology, applied in error detection/correction, instrumentation, computing, etc., can solve problems such as lagging in troubleshooting, affecting user experience, and failing to detect faults in time

Pending Publication Date: 2020-01-21
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in large-scale clusters and multi-cluster scenarios, it is too complicated and costly to deploy a dedicated node inspection channel
In such a scenario, the workload of manual inspections is even greater, resulting in failures to be discovered in time, serious delays in troubleshooting, and impaired network performance, which greatly affects user experience.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster fault detection method and device
  • Cluster fault detection method and device
  • Cluster fault detection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0089] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.

[0090] The general inspection method can only obtain the running status of the node service, and then display it in other ways. Since there is only service-level inspection, once a problem is found, you need to check the service log or related monitoring, and manually troubleshoot and locate the fault; In the case of multiple clusters, it is also necessary to switch back and forth on several sys...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cluster fault detection method and device, and relates to a computer internet technology, and solves the problem that manual inspection and deployment of special agent inspection cannot meet complex inspection requirements in a large-cluster-scale and multi-cluster scene. The method comprises the following steps: detecting a service exception node with service exceptionin nodes in a cluster; when the service exception node is detected, obtaining at least one upstream service cluster having a first dependency relationship with a cluster to which the service exceptionnode belongs; and detecting service state information of each upstream service cluster, wherein the service state information indicates that the upstream server cluster is normal in service or abnormal in service. The technical scheme provided by the invention is suitable for a large-scale cluster inspection scene, and efficient and accurate inspection in a network environment with high service complexity is realized.

Description

technical field [0001] The present disclosure relates to computer Internet technology, and in particular to a cluster detection method and device. Background technique [0002] The general inspection method can only obtain the running status of the node service, and then display it in other ways. Since there is only service-level inspection, once a problem is found, you need to check the service log or related monitoring, and manually troubleshoot and locate the fault; In the case of multiple clusters, it is also necessary to switch back and forth on several systems to troubleshoot problems. As the complexity of services continues to increase, the number of servers in the cluster has even increased to tens of thousands, and the difficulty of inspection has increased accordingly. [0003] A distributed inspection system can be formed for automatic inspection by deploying multiple agent inspection execution modules as the channel for node inspection. The central module assig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/30
CPCG06F11/3006G06F11/3055G06F11/3089
Inventor 刘志杰
Owner BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products