Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Collaborative analysis and localization method for multi-source faults for large-scale systems

A fault analysis and positioning method technology, applied in transmission systems, digital transmission systems, instruments, etc., to achieve precise positioning, improve the ability of automatic analysis and positioning of system faults, and solve the problem of accurate positioning

Active Publication Date: 2021-03-23
JIANGNAN INST OF COMPUTING TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a large-scale system-oriented multi-source fault collaborative analysis and positioning method, which improves the system fault automatic analysis and positioning ability, and solves the problem of large-scale parallel system faults The problem of accurate positioning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Collaborative analysis and localization method for multi-source faults for large-scale systems
  • Collaborative analysis and localization method for multi-source faults for large-scale systems

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] Embodiment: A multi-source fault collaborative analysis and location method for large-scale systems, based on large-scale heterogeneous systems, based on the following modules:

[0033] IPMI protocol fault monitoring module, used for fault collection of commercial servers;

[0034] SNMP protocol fault monitoring module, used for fault collection of network equipment;

[0035] Node heartbeat fault monitoring module, used for fault collection of special equipment software;

[0036] Maintenance channel fault monitoring module, used for fault collection of special equipment hardware;

[0037] The system core fault monitoring module is used to monitor various system environmental faults and Panic information, and notify the maintenance system service storage registration through registers;

[0038] The resource management fault monitoring module is used to monitor the status of software and hardware, and register through the resource management service;

[0039] The job s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale system-oriented multi-source fault collaborative analysis and positioning method, which includes the following steps: S1. Unified classification of faults collected by each fault monitoring module, defining a fault code Fid for each fault, and assigning a fault code Fid to each fault Fault definition upper and lower association lists Fuplist and Fdownlist, the upper association list Fuplist contains a group of fault codes Fid that will induce the fault, and the lower association list Fdownlist contains a group of fault Fids that the fault will induce; S2. The fault analysis system receives information from each fault The faults sent by the monitoring module form a current fault list; S3, the fault analysis system performs contextual analysis on the current fault list; S10, the fault analysis system realizes the precise positioning of a fault Fk, and jumps to S4. The invention improves the ability of automatic analysis and positioning of system faults, and solves the problem of accurate positioning of large-scale parallel system faults.

Description

technical field [0001] The invention relates to a large-scale system-oriented multi-source fault collaborative analysis and positioning method, belonging to the technical field of availability management of large-scale parallel systems. Background technique [0002] With the continuous development of high-performance computing technology, the scale of large-scale parallel systems is increasing day by day, the resources are diverse and large in quantity, the structure of software and hardware is complex, and the number of components is increasing, resulting in the mean time between failures (MTBF) of large-scale systems. Significantly reduced, component failure has become a normal event in the process of system operation, and the fault-tolerant system has become the basic support system for the stable and reliable operation of the system. Rapid discovery and accurate location of system faults are the basis for system fault handling and fault-tolerant operation. [0003] In m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/30H04L12/24
CPCG06F11/3051G06F11/3058H04L41/0677
Inventor 高剑刚龚道永宋长明钱宇李伟东张宏宇刘沙
Owner JIANGNAN INST OF COMPUTING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products