NGINX Map Comparisons

Using Regular Expressions

John H Patton
8 min readMar 19, 2021
Not equal
Photo by Coffee Geek on Unsplash

Rationale

While trying to find a simple way to compare the values of two NGINX variables, I decided to see what could be done with regular expressions to avoid writing lua code or something similar. The example below is tested and works fine in NGINX v1.19.0 (R22) or higher.

Nginx Maps

In order to understand the variable comparison method, it is necessary to understand what an NGINX map is and how to use it. Maps are defined at the http{} block level and are used in the same way set variables are used, except that NGINX maps are only processed when referenced. If a request flow doesn’t touch the section of a configuration where a map variable is used, that map variable lookup will not be performed. As a result, NGINX maps add no overhead to requests until NGINX needs to use the variable. NGINX maps are a great way to create low overhead variables.

An NGINX map is used to create a variable based on the value of another variable, like so:

In the above example, $variable_to_set will be set to the value “variable_matches_checked_value” if $variable_to_check is set to the value “check_if_variable_matches_me” or it will be set to “no_match” if it doesn’t.

For the uninitiated, NGINX maps can be a little difficult to visualize and are often unused in favor of just usingset to create variables. Maps are a workhorse and can provide a huge advantage when configuring NGINX. To help see what is happening inside a map, the NGINX map above can be rewritten as a bash if statement as follows:

Or even a bash switch case to help visualize with a different format:

A more complex map has multiple comparisons and map value pairs:

Matching with Regular Expressions: Volatile Warning

A regular expression (regex) is a useful way to match complex string patterns in source variables, but it does add some overhead to parse the expression. By default, the regex overhead is limited to a single lookup since an NGINX map will only perform the lookup once for each request being processed. However, enabling the volatile; map parameter turns off variable caching, meaning every usage of an NGINX map includes another full lookup and additional overhead for that request and can result in performance degradation at high volume, especially if regular expressions are used.

Another side effect of volatile is to turn off caching on dependent map variables that require a lookup on the volatile map variable. If a complex regular expression is needed, do not use the volatile; parameter in the map if the associated map variable — or any variable that depends on that map — is referenced in many places. This is not noticeable until there is significant traffic that causes the lookups to become amplified and CPU utilization will rise. It is not obvious that this is the cause of the CPU increase, so treat volatile with care.

Overall, it is best to avoid using volatile; unless a lookup depends on a variable that might change during request processing and the variable lookup happens during the execution phase where the correct value is guaranteed to avoid any unexpected outcomes.

Shun Example

An example of a complex regular expression might be validating some input before allowing it to be proxied to a vulnerable backend. The ACME Company is seeing a lot of requests with // characters in the URI and requests are bypassing protections. Patterns observed match this format: //api/product or /api//product. Additionally, periodic spikes of requests at certain times with %2f or %2F in the URI are observed, like so: /api%2Fproduct or %2fapi/product. These patterns can be matched using a map with regular expressions and the map variable can be used in a shun rule.

Matcher

The NGINX administrator might be tempted to use $uri for this matcher since it doesn’t have the query string, but this is also a work variable and the regular expression match might be evaluated after a request processing execution phase modifies the value — or even normalizes the encoding — and we will not match what was actually sent. Since NGINX will normalize the value of $uri including any encoded characters, the %2f will be decoded to ‘/’ and not match the encoded version. Using $uri is not acceptable for these reasons.

Remember, the NGINX map will only be executed when it is needed, and that may not happen at the very beginning of the request processing phases. Use $uri with care in maps.

The NGINX variable $request_uri includes the query string but is unaltered throughout the request processing phases, so we should use this instead. To simplify processing, it would benefit the regular expression parser to remove the query string if an unaltered URI variable is needed for multiple maps, so separating this into two maps for the shun rule results in the following so that $uri_only can be reused in other maps:

In the above example, the regular expressions are parsed as follows:
map $request_uri $uri_only
~
: tells parser the string is to be interpreted as a regex.
^ : regular expression will match at the beginning of the string value held in $request_uri.
^(?<u>...) : creates an named capture group “?<u>” containing all characters that match the expression in the parentheses starting at the beginning of the string (^) and assigns the capture to $u. Named capture group variables are only scoped to the map and cannot be used elsewhere.
^(?<u>[^\?]+) : square brackets are used to define the characters captured to include everything up to but not including a literal question mark “\?”; in other words, the unaltered URI portion of $request_uri.
\?(?:.*)? : match a literal question mark “\?” followed by an optional uncaptured group of any number of any characters to the end of the string. The “?:” tells the regex parser to not capture the group of characters in a backreference variable. NOTE: This part of the regular expression is not necessary since the uri will already be captured in $u and the regex parser will still match ignoring the rest of the $request_uri variable, but it has been added for completion.

If the parser matched, $u will contain the value used to set $uri_only. If the parser didn’t match, the default line will use $request_uri to set $uri_only.

map $uri_only $shun_if_client_is_a_baddy
~ : tells parser the string is to be interpreted as a regex.
\/\/ : unconstrained match if any part of $uri_only contains two consecutive literal forward slashes. NOTE: the NGINX regex parser doesn’t need to escape the forward slashes, but it makes it difficult to validate your regular expression if you use a tool like regex101.

~* : tells parser the string is to be interpreted as a case-insensitive regex.
%2f : unconstrained match if any part of $uri_only the literal string “%2f”.

If either of the two regular expressions matched, the value “1” is assigned to $shun_if_client_is_a_baddy , otherwise it will be assigned “0”.

Shun Rule

To make use of the above map, the following could be placed at the top of the server{} block to execute the above two maps and trigger a response early in the request processing phases:

NOTE: NGINX if statements use a single equals operator for comparisons.

The if statement doesn’t need the “= 1” portion to evaluate as true, that is a default for NGINX and is there for readability.

NOTE: IF IS EVIL!!! But, we’ve taken some pains to avoid the pitfalls.

Although if is evil, unless another language module is being used within the configuration, this is one of the rare exceptions due to the complexity being done in a proper manner within the maps. This ensures the variables used do not change throughout the execution phases, and leaving only a simple integer comparison for the if statement in the end.

If using embedded language module, like lua or javascript, care must be taken within the language module code when returns in the main NGINX configuration are present. There is an execution order that can cause unexpected results and may cause a lot of confusion and headaches. In these cases, NGINX if statements intended to shun should be moved to an access phase block for the language module.

Comparing Two Variables?

The trouble any NGINX administrator will run into is trying to compare two variables for equality using pure NGINX configuration directives. There is no simple way to do the following (nor is this desired, if is evil):

Most of the time, lua is recommended for this, but requires some additional overhead and configuration to get working. This is not necessarily a bad approach, but it does add some complexity to the NGINX configuration and another bit of code that needs to be managed either in-line or in a separate file. However, since NGINX maps allow for PCRE regular expressions, the following is possible and quite useful:

In the above example, the regular expression is parsed as follows:
~ : tells parser the string is to be interpreted as a regex
^ : regular expression will match at the beginning of the string value held in $thing1:$thing2.
^([^:]+) : creates an unnamed capture group “\1” of all character from the beginning of the string “^” up to but not including the colon (:) character. This ends up capturing the value of $thing1 in a back-reference variable “\1”.
:\1$ : compares the string captured in the back-reference “\1” to the value of $thing2.

Since a regular expression can also act as a truthiness indicator, if $thing1 and $thing2 match, then the expression will be a “match” -or- evaluate to “true” and will set $do_things_match to the value “1”. If $thing1 does not match $thing2, then the expression does not match and $do_things_match will be set to “0”.

Comparison with Shun Rule

So, now that the configuration is using maps for complex things and the comparison method is known, the ACME Company has discovered a problem with requests coming in that should have a query string arg (FOO) and header (X-BAR) that match, but upon inspection they do not match and this is impacting the backend system.

NOTE: A real world scenario would usually involve parsing POST body data for a value out of the payload to match against, but that is not covered in this article.

To shun this, the map will be defined as follows:

And the corresponding if statement in the server{} block is as follows:

If there are a lot of maps that might be executed at the top of the server block in shun rules containing some heavyweight regular expressions, it might be good to place those if statements in location blocks where the impact is targeted if there is some performance degradation under load. Most of the time, these execute very quickly and the benefit far outweighs the trouble being caused by malicious traffic.

Conclusion

Comparing two variables is not often needed, and should probably be avoided due to the regular expression overhead on high volume servers. However, if the alternative is writing lua code or something similar, this will provide a simple path to solving a problem that is not easy to do otherwise in a pure NGINX configuration.

If you have an alternative, or even a better way to do this, feel free to send a comment. Good luck!

--

--

John H Patton

Following the motto: “Share what you know, learn what you don’t”