A method, computer system and device for troubleshooting

A computer system and fault isolation technology, applied in the computer field, can solve problems such as fault diffusion, inability to effectively isolate faulty equipment, affecting system reliability, etc., to achieve the effect of ensuring reliability

Active Publication Date: 2016-08-17
HUAWEI TECH CO LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the prior art, there is a time window from when the faulty device generates an error message to when the operating system processes the error message. In this time window, the CPU or other PCIe endpoint devices and the faulty device can still continue to communicate with each other, and cannot effectively Isolating the faulty device may cause the fault to spread, affecting the reliability of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, computer system and device for troubleshooting
  • A method, computer system and device for troubleshooting
  • A method, computer system and device for troubleshooting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] An embodiment of the present invention provides a fault isolation method, which is used to prevent mutual access between the primary domain and the extended domain endpoint device when a fault occurs in the endpoint device of the extended domain, so as to prevent the fault from spreading to the primary domain.

[0054] Such as image 3 As shown, the flow chart of the fault isolation method provided by the embodiment of the present invention is used for a computer system interconnected by PCIe, and the computer system includes a main domain and an extended domain, and the main domain consists of a root complex, a first endpoint device formed with an RCEP, the extended domain is formed by the RCEP and a second endpoint device, the method comprising:

[0055] 101: Monitor a state of a second endpoint device of the extended domain.

[0056] The state of the second end point device may include a fault state and a non-fault state, the fault state indicates that the second end ...

Embodiment 2

[0067] Such as Figure 4 As shown, the flow chart of the fault isolation method provided by the embodiment of the present invention is used for a computer system interconnected by PCIe, the computer system includes a main domain and an extended domain, and the main domain consists of a root complex, a first endpoint a device is formed with an RCEP, the extended domain is formed by the RCEP and a second endpoint device that communicates with a root complex or a first endpoint device in the primary domain through the RCEP, The method can include:

[0068] 201: Monitor a state of a second endpoint device of the extended domain.

[0069] The state of the second end point device includes a fault state and a non-fault state, the fault state indicates that the second end point device fails and cannot work normally, and the non-fault state indicates that the second end point device of the extended domain can work normally, The RCEP monitoring the state of the second endpoint device ...

Embodiment 3

[0108] combine figure 1 The computer system shown, the fault isolation method provided by the embodiment of the present invention is as follows Figure 5 As shown, the second endpoint device 116 of the extended domain is a faulty device, using the DMA access method, the first endpoint device 108 of the main domain sends a Non-post type access request to the faulty second endpoint device 116 to access, the access request is first routed to the RCEP106, because when the second endpoint device fails, the access request may have crossed the boundary of the RCEP106, that is, it may have been forwarded by the RCEP106 , it is also possible that the border of the RCEP106 has not yet been crossed, that is, it has not been forwarded by the RCEP106, and the method may specifically include:

[0109] 301: The RCEP 106 monitors states of all second endpoint devices in the extended domain.

[0110] The state of the device includes a fault state and a non-fault state, and the RCEP106 monito...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present invention relates to a fault isolation method, computer system and device, capable of monitoring the state of the second end point device in the extended domain, and establishing a device state record according to the state of the second end point device, and receiving After receiving an access request between the second endpoint device and the main domain, query the device status record according to the identification information of the second endpoint device in the access request, and determine that the second endpoint device state, if the state of the second endpoint device is in a fault state, discard the access request, thereby preventing the communication between the faulty second endpoint device and the primary domain, preventing the fault from spreading to the primary domain, and ensuring system reliability.

Description

technical field [0001] Embodiments of the present invention relate to computer technology, in particular to a fault handling method, computer system and device. Background technique [0002] Peripheral Component Interconnect Express (PCIe) bus technology is a high-performance bus technology used to interconnect CPUs and peripheral devices. PCIe, as a new generation of bus and interface standard, adopts serial interconnection mode to transmit data in the form of point-to-point, which greatly improves the transmission rate and creates conditions for higher frequency. It is widely used in industrial servers and PCs. , Embedded computing / communication and workstations, etc., gradually replacing PCI, AGP and other buses. Currently, PCIe device failures account for a large portion of all system failures. Monitoring the system in real time, identifying the occurrence of errors, and detecting and processing system failures can effectively avoid the overall interruption of system o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/07
CPCG06F11/0793G06F13/28G06F11/0751G06F11/0706G06F11/0766G06F11/0772G06F11/0796G06F11/0745G06F11/3027G06F11/3041G06F11/3051G06F11/3485G06F11/349
Inventor 林沐晖王俊捷王瑞玲
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products