How to Compare Two XML Files and See What Changed
The quickest way to compare two XML files is to paste both into a side-by-side diff tool, pretty-print them the same way, and read the lines it highlights. The comparing is the easy part. The noise is what trips people up: reordered attributes, whitespace between tags, and namespace prefixes can make two files that mean the same thing look like they share nothing at all.
This guide walks through how to get a clean, trustworthy XML diff. We will look at why two equivalent documents drift apart on paper, which methods are worth knowing, and a worked example you can follow. If you just want the tool, our XML compare page does all of this in the browser.
Why XML files are deceptively hard to compare
XML has a strict grammar (see the W3C XML specification), but it gives writers a lot of latitude in how the text is laid out. Two documents can describe the exact same data and still differ byte for byte. A plain text diff does not understand any of that, so it flags all of it.
Here is the key fact to hold onto: in XML, the order of attributes on an
element is not significant. The
XML Information Set
treats attributes as an unordered set. So
<user id="7" role="admin"/> and
<user role="admin" id="7"/> carry the same information,
even though a line diff paints them red and green. Element order, on the
other hand, usually does matter.
| What you see in the diff | Is it a real change? | What to do |
|---|---|---|
| Attributes in a different order | No, attribute order is not significant | Canonicalize both sides |
| 2-space vs 4-space indent | No | Pretty-print both the same way |
| Whitespace between elements | Usually not | Format, or strip insignificant whitespace |
<br/> vs <br></br> | No, same empty element | Canonicalize both sides |
| A different namespace prefix for the same URI | No, prefixes are arbitrary labels | Compare by namespace URI, not prefix |
| Child elements in a different order | Usually yes, element order matters | Investigate, this is likely real |
That last row is the one to watch. Attribute order is free, but the order of child elements is part of the document in most schemas. If you want the detail on how a parser sees all this, MDN has a solid reference on parsing XML with DOMParser.
Four ways to compare XML, and when to reach for each
There is no single best method. It depends on where the files live and what you are trying to learn. Here is how the common options stack up.
| Method | Best for | Effort | Understands XML? |
|---|---|---|---|
| Eyeballing it | Tiny files, one or two elements | Low | No, you are the parser |
| Online diff tool | Quick checks, pasting from anywhere | Low | With pretty-print, yes |
Command line (xmllint) | Files on disk, scripting, canonical form | Medium | Yes, with --c14n |
IDE or git diff | Files already in a repo | Low if committed | Line-based by default |
For most people a browser tool wins on speed: nothing to install, and you
can paste a fragment straight from a config file or a SOAP response. The
catch is formatting noise, which we deal with next. If you live on the
terminal, libxml2's
xmllint is the tool to know.
The fastest clean comparison, step by step
This is the routine I use when someone hands me two config files and asks "what's different?" It takes about fifteen seconds.
- Open the XML compare tool.
- Paste the original on the left, the new version on the right.
- Click Format on both sides so they share the same indentation.
- Scan for real differences. Green is added, red is removed, and a changed value shows as one of each.
- Ignore the rows that are only attribute reordering or whitespace.
Step three is most of the trick. Once both documents use the same indentation, the only thing left to highlight is what actually changed. Our diff engine is built on Google's diff-match-patch, which compares line by line first so it stays fast even on long files.
A worked example
Say you are reviewing a change to a service config. Here is the before:
<user id="7" role="editor">
<name>Ada Lovelace</name>
<active>true</active>
<seats>3</seats>
</user>
And here is the after, as a teammate handed it to you:
<user role="admin" id="7">
<name>Ada Lovelace</name>
<active>true</active>
<seats>5</seats>
<team>platform</team>
</user>
Drop those into a raw line diff and the very first line looks changed,
because id and role swapped places. Format both,
compare by meaning, and the real story is short:
| Node | Before | After | Change |
|---|---|---|---|
@role | editor | admin | Modified |
seats | 3 | 5 | Modified |
team | — | platform | Added |
@id | 7 | 7 | No change (just moved) |
name | Ada Lovelace | Ada Lovelace | No change |
Three real edits: a role bump, a seat count, and a new team element. The
attribute swap was noise. That promotion from editor to
admin is exactly the kind of thing you want to catch in
review, and it is easy to miss when it is buried under a line the diff
wrongly flagged.
Canonical XML: the proper way to ignore noise
Formatting both sides handles indentation, but there is a standard built for exactly this problem. Canonical XML, defined by the W3C in Canonical XML 1.1, rewrites a document into a single normalized form: attributes sorted, empty elements expanded, whitespace in tags normalized, and default attributes made explicit. Two documents that are equivalent produce identical canonical output. It is the XML equivalent of sorting JSON keys.
xmllint --c14n old.xml > old.c14n.xml
xmllint --c14n new.xml > new.c14n.xml
diff old.c14n.xml new.c14n.xml
Now diff only reports content that truly changed, because
both files have been normalized the same way. If you just want readable
indentation instead of strict canonical form, xmllint --format
file.xml pretty-prints it, which is the terminal equivalent of
clicking Format in the browser.
Namespaces: the part that confuses everyone
XML namespaces let two documents use the same vocabulary with different
prefix labels. <ns1:user> bound to a URI and
<u:user> bound to the same URI are the same
element; the prefix is just a local nickname. A text diff sees
ns1 versus u and flags a change that is not one.
The fix is to compare by the namespace URI rather than the prefix, which is
precisely what canonicalization does. The
Namespaces in XML
spec is the reference if you need to settle an argument about it.
Common gotchas to watch for
| Gotcha | Why it bites | Fix |
|---|---|---|
| Character encoding | A UTF-8 and a UTF-16 file can hold the same text but differ byte for byte | Normalize encoding; the XML declaration states it |
| Entity references | & and a literal & can both appear for the same character | Canonicalize, which resolves entities consistently |
| CDATA vs escaped text | <![CDATA[a<b]]> and a<b are the same text content | Compare the parsed value, not the raw bytes |
| Significant whitespace | Inside xml:space="preserve", spaces matter and must not be stripped | Do not blindly trim; respect xml:space |
| Self-closing tags | <x/> and <x></x> are identical | Canonicalize so both render the same way |
Text diff vs structural diff
Everything above is a text diff: fast, visual, and perfect for a person reading a change. A structural diff goes further and describes the change in terms of the XML tree: this attribute changed, that child element was inserted at this path. You want a structural diff when a program needs to apply the change or when element order genuinely does not matter and you want it ignored. For day-to-day review, a text diff of two formatted documents is plenty.
Related tools
XML is rarely the only format you deal with. If you are comparing API payloads, JSON compare applies the same idea to JSON. Marked-up pages are easier to read on the HTML compare page, and environment settings line up well on the config compare tool.
Frequently asked questions
- Does comparing XML files online upload them anywhere?
- On comparetext.org the diff runs in your browser. The two XML files are compared by JavaScript on your own machine, so nothing is sent to a server unless you explicitly click Save or Share. That makes it safe for config files, SOAP messages, and other data you would not want to paste into a site that uploads on every keystroke.
- Why do my two XML files show every line as different?
- Almost always it is formatting, not real changes. One file is minified or indented with tabs, the other with two spaces, or the attributes are in a different order. Click Format on both sides so they use the same indentation. After that the diff usually shrinks to the handful of values that genuinely changed. For a stricter normalization, canonicalize both files with xmllint --c14n first.
- Does attribute order matter when comparing XML?
- No. In XML the attributes on an element are an unordered set, so
<a x="1" y="2"/>and<a y="2" x="1"/>are equivalent. A plain text diff does not know this and will flag the reorder as a change. Canonical XML sorts attributes into a fixed order, so canonicalizing both sides before comparing makes the false positive disappear. Element order, by contrast, usually is significant. - How do I compare XML while ignoring namespace prefixes?
- Namespace prefixes are local labels for a namespace URI, so
ns1:userandu:userbound to the same URI are the same element. To compare correctly, normalize by URI rather than by prefix. The simplest way is to canonicalize both documents with xmllint --c14n, which rewrites namespace bindings consistently, then diff the results. A raw text diff cannot do this on its own. - Can I compare large XML files without the page freezing?
- Yes, up to a point. A line-mode diff stays fast on files with thousands of lines because it compares whole lines first instead of every character. Very large files (several megabytes) are better handled with a command-line tool like xmllint or git diff, which stream the data. For anything you can comfortably scroll through in a browser, an online diff is the quicker option.
- What is the difference between a text diff and a structural diff of XML?
- A text diff compares the files line by line, the same way it would compare two essays. A structural diff understands the XML tree, so it knows that a reordered attribute is not a change and can report an inserted element by its path. Text diffs are faster and good enough for most reviews once both sides are formatted. Structural diffs matter when a program needs to apply the change or when you want element order ignored.
Ready to try it? Paste your files into the XML compare tool and see what changed.