Method for identifying language used in encrypted VoIP network traffic

A technology for network traffic and identification methods, applied in the field of language analysis and discrimination, can solve the problems of high encryption overhead, large delay, and unacceptability, and achieve the effect of good discrimination accuracy and improved recognition accuracy.

Active Publication Date: 2019-08-09
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Several encryption schemes proposed by the prior art for VoIP transmission, such as VoIP based on IPSec, this scheme has a large encryption overhead and introduces a large delay, which is not accepted; the security RTP (SRTP) protocol supported by the National Institute of Standards and Technology (NIST) in the United States, The protocol extends RTP to provide confidentiality, identity authentication and integrity services for applications, and has now become an IETF standard (RFC3711: TheSecureReal-timeTransportProtocol)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for identifying language used in encrypted VoIP network traffic
  • Method for identifying language used in encrypted VoIP network traffic

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] As a most basic implementation of the present invention, this embodiment discloses a method for identifying the language used in encrypted VoIP network traffic, such as figure 2 , comprising a modeling step, an acquisition processing step and a comparison and identification step;

[0043] Described modeling step is to set up the reference language packet length probability distribution model by using the VoIP packet length sequence feature formed by different languages;

[0044] The collection and processing step is to collect the language data used by the VoIP network traffic to be identified and perform preprocessing, and the preprocessing is to classify the language data according to language types to form a target language packet length probability distribution model;

[0045] The comparison and recognition step is to compare the target language packet length probability distribution model formed in the collection processing step with the reference language packet ...

Embodiment 2

[0048] As a preferred embodiment of the present invention, on the basis of Embodiment 1, a method for identifying the language used in encrypted VoIP network traffic disclosed in this embodiment further:

[0049] Such as figure 1 , the reference modeling step is to use the known reference speech language VoIP network traffic data with the Speex codec narrowband mode, according to the method of preprocessing in the collection processing step, form the three-dimensional time of the packet length after the data packet preprocessing Sequence (S i , S j , S k ), as a sample point of the reference language packet length probability model, the VoIP protocol runs on top of the TCP / UDP protocol, and most of them use the UDP protocol. The packet length here refers to the UDP layer data packet length in bytes minus the fixed The number of bytes in the length of the UDP header, the length of the UDP layer data packet can be obtained by directly reading the length field in the UDP heade...

Embodiment 3

[0062] As a preferred embodiment of the present invention, this embodiment discloses a method for identifying the language used in encrypted VoIP network traffic, including the following steps:

[0063] Step S1, network data collection and preprocessing; obtain data traffic packets from the network, remove irrelevant traffic, retain VoIP encrypted network traffic, and save files for data packet length feature extraction; preprocess data packets, and confirm VoIP The data packet uses the SRTP protocol on the UDP layer. Whether the SRTP protocol uses padding (the padding byte length needs to be deducted after padding is used), and the payload length can be obtained by using the built-in function of the packet capture software (free software Wireshark is recommended). Export the packet length field from the packet capture software and save it as a csv file for the next step of modeling

[0064] Step S2, forming the packet length probability model of the reference language; using ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for identifying a language used in encrypted VoIP network traffic, and belongs to the technical field of network security. The method includes a modeling step, an acquisition and processing step, and a comparison and recognition step. In the modeling step, reference language package length distribution models are built through VoIP package length sequence characteristics formed by different languages. In the acquisition and processing step, language data used by to-be-identified VoIP network traffic is collected and pre-treated, and the pretreatment includes the step of classifying the language data into a target language package length distribution model according to the language type. In the comparison and recognition step, the target language package length distribution model formed in the acquisition and processing step is compared with the reference language package length distribution model built in the modeling step one to one, and language corresponding to the reference language package length distribution model with the highest similarity with the target language package length distribution model is selected as a structure for outputting.

Description

technical field [0001] The invention belongs to the technical field of computer network security, and in particular relates to a language analysis and discrimination method used in network-based encrypted VoIP data flow. Background technique [0002] Network traffic analysis and identification can detect illegal behavior, implement access control, resource allocation, and provide quality of service QoS guarantee, etc., which is an important supporting technology for network operation management and security. [0003] Traditional traffic analysis characterizes, monitors, and predicts trends through features such as IP addresses and host names, communication ports, or entire data packets. After the traffic data is encrypted, the DPI (DeepPacketInspection) technology for packet content detection is challenged. The current Internet usage report shows that more than half of the Internet traffic has been encrypted, and the analysis and identification of encrypted traffic has attr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/00G10L15/06H04M7/00
CPCG10L15/005G10L15/063H04M7/006
Inventor 周琨汪文勇唐勇黄鹂声张骏
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products