Network Traffic Characterization Using (p,n)-grams Packet Representation

Public Deposited
Resource Type
Creator
Abstract
  • With the ever increasing advances in network protocols and traffic complexity, new challenges are emerging in traffic characterization and management. In this thesis, we propose a new approach that can complement existing ones with a simple high-level understanding of network traffic. Our approach uses (p,n)-grams representation to analyze network traffic, where a (p,n)-gram is an $-byte string starting at offset p. We argue that the (p,n)-grams representation combines the efficiency of using specific packet fields (e.g. ports) with the generalized pattern matching of n-grams, without the complexity and overhead of full packet pattern matching. We also show that using (p,n)-grams allows for traffic analysis at all packet parts (payload content, header port/flow, and other header behavior fields), without mixing between similar patterns that may accidentally exist at different fields within packets. As a proof of concept, we develop a (p,n)-gram-based lightweight unsupervised clustering algorithm (ADHIC) that makes no prior assumptions about the involved protocols. We show that ADHIC can automatically cluster network traffic using a binary decision tree into equivalence classes that closely approximate standard measures of network traffic. We also show that ADHIC can be used to monitor network traffic through observing the dynamic updates to the clustering tree. Those incremental updates highlight the temporal changes in network traffic that are not easily detected using standard network analysis methods. We then research the characteristics and distributions of (p,n)-grams in network packets, and how they can be utilized for traffic analysis. In particular, we argue that (p,n)-grams have automatic fingerprinting capability where a simple frequency analysis of network packets can capture structural (p,n)-grams based on their relative high frequencies. These (p,n)-grams represent protocol and sub-protocol structures and cross-protocol patterns. We observe that (p,n)-grams follow a power-law-like distribution where the structural ones constitute the rapidly-dropping-off curve before the long tail. We argue that this special distribution adds to the efficiency of (p,n)-grams-based traffic analysis as it describes structural (p,n)-grams as 1) a small set of (p,n)-grams that 2) can be easily distinguished from the long list. Our observation relies on a thorough empirical analysis using independent network traffic traces.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2014 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2014

Relations

In Collection:

Items