CAPEC-192 - Protocol Reverse Engineering

An attacker engages in activities to decipher and/or decode protocol information for a network or application communication protocol used for transmitting information between interconnected nodes or systems on a packet-switched data network. While this type of analysis inherently involves the analysis of a networking protocol, it does not require the presence of an actual or physical network. Although certain techniques for protocol analysis benefit from manipulating live 'on-the-wire' interactions between communicating components, static or dynamic analysis techniques applied to executables as well as to device drivers such as network interface drivers, can also be used to reveal the function and characteristics of a communication protocol implementation.

Depending upon the methods used, protocol reverse engineering can involve similar methods as those employed when reverse engineering an executable, or the process may involve observing, interacting, and modifying actual communications occurring between hosts. The goal of protocol reverse engineering is to derive the data transmission syntax, as well as to extract the meaningful content, including packet or content delimiters used by the protocol. This type of analysis is often performed on closed-specification protocols, or proprietary protocols, but is also useful for analyzing publicly available specifications to determine how particular implementations deviate from published specifications.

There are several challenges inherent to protocol reverse engineering depending upon the nature of the protocol being analyzed. There may also be other types of factors which complicate the process such as encryption or ad hoc obfuscation of the protocol. In general there are two kinds of networking protocols, each associated with its own challenges and analysis approaches or methodologies. Some protocols are human-readable, which is to say they are text-based protocols. Examples of these types of protocols include HTTP, SMTP, and SOAP. Additionally, application-layer protocols can be embedded or encapsulated within human-readable protocols in the data portion of the packet. Typically, human-readable protocol implementations are susceptible to automatic decoding by the appropriate tools, such as Wireshark/ethereal, tcpdump, or similar protocol sniffers or analyzers.

The presence of well-known protocol specifications in addition to easily identified protocol delimiters, such as Carriage Return or Line Feed characters (CRLF) result in text-based protocols susceptibility to direct scrutiny through manual processes. Protocol reverse engineering against protocol implementation such as HTTP is often performed to identify idiosyncratic implementations of a protocol by a server or client. In the case of application-layer protocols which are embedded within text-based protocols, analysis techniques typically benefit from the well-known nature of the encapsulating protocols and can focus on discovering the semantic characteristics of the proprietary protocol or API, since the syntax and protocol delimiters of the underlying protocols can be readily identified.

When performing protocol analysis of machine-readable (non-text-based) protocols difficulties emerge as the protocol itself was designed to be read by computing process. Such protocols are typically composed entirely in binary with no apparent syntax, grammar, or structural boundaries. Examples of these types of protocols are IP, UDP, and TCP. Binary protocols with published specifications can be automatically decoded by protocol analyzers, but in the case of proprietary, closed-specification, binary protocols there are no immediate indicators of packet syntax such as packet boundaries, delimiters, or structure, or the presence or absence of encryption or obfuscation. In these cases there is no one technology that can extract or reveal the structure of the packet on the wire, so it is necessary to use trial and error approaches while observing application behavior based on systematic mutations introduced at the packet-level. Tools such as Protocol Debug (PDB) or other packet injection suites are often employed. In cases where the binary executable is available, protocol analysis can be augmented with static and dynamic analysis techniques.






Access to a binary executable such that it can be analyzed using static and/or dynamic tools, or the ability to observe and interact with a communication channel between communicating processes. Debugging programs such as ollydbg, SoftICE, or disassemblers such as IDA Pro. Packet sniffing or packet analyzing programs such as TCP dump, Wireshark. Specific protocol analysis tools such as PDB (Protocol Debug). Packet injection tools such as pcap or Nemesis.