CAPEC-3 - Using Leading 'Ghost' Character Sequences to Bypass Input Filters

An attacker intentionally introduces leading characters that enable getting the input past the filters. The API that is being targeted, ignores the leading "ghost" characters, and therefore processes the attackers' input. This occurs when the targeted API will accept input data in several syntactic forms and interpret it in the equivalent semantic way, while the filter does not take into account the full spectrum of the syntactic forms acceptable to the targeted API.

Some APIs will strip certain leading characters from a string of parameters. Perhaps these characters are considered redundant, and for this reason they are removed. Another possibility is the parser logic at the beginning of analysis is specialized in some way that causes some characters to be removed. The attacker can specify multiple types of alternative encodings at the beginning of a string as a set of probes.

One commonly used possibility involves adding ghost characters--extra characters that don't affect the validity of the request at the API layer. If the attacker has access to the API libraries being targeted, certain attack ideas can be tested directly in advance. Once alternative ghost encodings emerge through testing, the attacker can move from lab-based API testing to testing real-world service implementations.

Severity

Likelihood

Confidentiality

Integrity

Availability

  • Attack Methods 2
  • Injection
  • API Abuse
  • Purposes 1
  • Exploitation
  • Sec Principles 3
  • Defense in Depth
  • Reluctance to Trust
  • Least Privilege
  • Scopes 2
  • Gain privileges / assume identity
  • Authorization
  • Access_Control
  • Confidentiality
  • Modify application data
  • Integrity

Medium level:

The targeted API must ignore the leading ghost characters that are used to get past the filters for the semantics to be the same.

Perform input validation and filtering on data in its canonical form.

Understand the APIs to which user input will be passed and know how permissive they are. Perform appropriate input validation given that information.

Step 1 -

Determine if the source code is available and if so, examine the filter logic..


Step 1 -

If the source code is not available, write a small program that loops through various possible inputs to given API call and tries a variety of alternate (but equivalent) encodings of strings with leading ghost characters. Knowledge of frameworks and libraries used and what filters they apply will help to make this search more structured..

Step 2 -

Observe the effects. See if the probes are getting past the filters. Identify a string that is semantically equivalent to that which an attacker wants to pass to the targeted API, but syntactically structured in a way as to get past the input filter. That encoding will contain certain ghost characters that will help it get past the filters. These ghost characters will be ignored by the targeted API..


Step 1 -

Once the "winning" alternate encoding using (typically leading) ghost characters is identified, an attacker can launch the attacks against the targeted API (e.g. directory traversal attack, arbitrary shell command execution, corruption of files).


Perform white list rather than black list input validation.

Canonicalize all data prior to validation.

Take an iterative approach to input validation (defense in depth).