The invention provides a network
threat intelligence-oriented
annotation corpus generation method and an electronic device, and the method comprises the steps of extracting a safety entity in a structured
threat intelligence data
training set, mapping each piece of structured
threat intelligence data into a (head entity, relation type and
tail entity) triple, and obtaining a head entity set and atail entity set; extracting security entities in the to-be-labeled text, and obtaining sentences containing at least one security entity belonging to the head entity set and at least one security entity belonging to the
tail entity set; judging a relationship type contained in the
sentence; annotating each (head entity, relationship type and
tail entity) triple of all sentences to obtain an initial
annotation data set, and then obtaining a denoised
annotation data set. According to a remote supervision theory, existing structured network
threat intelligence data is utilized to
label unlabeledcorpora, large-scale training corpora are generated, and an automatic denoising and
cross validation method is provided to solve the problem that
noise data exists in the labeled corpora.