How to Compare Two XML Files and See What Changed

The quickest way to compare two XML files is to paste both into a side-by-side diff tool, pretty-print them the same way, and read the lines it highlights. The comparing is the easy part. The noise is what trips people up: reordered attributes, whitespace between tags, and namespace prefixes can make two files that mean the same thing look like they share nothing at all.

This guide walks through how to get a clean, trustworthy XML diff. We will look at why two equivalent documents drift apart on paper, which methods are worth knowing, and a worked example you can follow. If you just want the tool, our XML compare page does all of this in the browser.

Why XML files are deceptively hard to compare

XML has a strict grammar (see the W3C XML specification), but it gives writers a lot of latitude in how the text is laid out. Two documents can describe the exact same data and still differ byte for byte. A plain text diff does not understand any of that, so it flags all of it.

Here is the key fact to hold onto: in XML, the order of attributes on an element is not significant. The XML Information Set treats attributes as an unordered set. So <user id="7" role="admin"/> and <user role="admin" id="7"/> carry the same information, even though a line diff paints them red and green. Element order, on the other hand, usually does matter.

Looks like a change, usually isn't
What you see in the diffIs it a real change?What to do
Attributes in a different orderNo, attribute order is not significantCanonicalize both sides
2-space vs 4-space indentNoPretty-print both the same way
Whitespace between elementsUsually notFormat, or strip insignificant whitespace
<br/> vs <br></br>No, same empty elementCanonicalize both sides
A different namespace prefix for the same URINo, prefixes are arbitrary labelsCompare by namespace URI, not prefix
Child elements in a different orderUsually yes, element order mattersInvestigate, this is likely real

That last row is the one to watch. Attribute order is free, but the order of child elements is part of the document in most schemas. If you want the detail on how a parser sees all this, MDN has a solid reference on parsing XML with DOMParser.

Four ways to compare XML, and when to reach for each

There is no single best method. It depends on where the files live and what you are trying to learn. Here is how the common options stack up.

MethodBest forEffortUnderstands XML?
Eyeballing itTiny files, one or two elementsLowNo, you are the parser
Online diff toolQuick checks, pasting from anywhereLowWith pretty-print, yes
Command line (xmllint)Files on disk, scripting, canonical formMediumYes, with --c14n
IDE or git diffFiles already in a repoLow if committedLine-based by default

For most people a browser tool wins on speed: nothing to install, and you can paste a fragment straight from a config file or a SOAP response. The catch is formatting noise, which we deal with next. If you live on the terminal, libxml2's xmllint is the tool to know.

The fastest clean comparison, step by step

This is the routine I use when someone hands me two config files and asks "what's different?" It takes about fifteen seconds.

  1. Open the XML compare tool.
  2. Paste the original on the left, the new version on the right.
  3. Click Format on both sides so they share the same indentation.
  4. Scan for real differences. Green is added, red is removed, and a changed value shows as one of each.
  5. Ignore the rows that are only attribute reordering or whitespace.

Step three is most of the trick. Once both documents use the same indentation, the only thing left to highlight is what actually changed. Our diff engine is built on Google's diff-match-patch, which compares line by line first so it stays fast even on long files.

A worked example

Say you are reviewing a change to a service config. Here is the before:

<user id="7" role="editor">
  <name>Ada Lovelace</name>
  <active>true</active>
  <seats>3</seats>
</user>

And here is the after, as a teammate handed it to you:

<user role="admin" id="7">
  <name>Ada Lovelace</name>
  <active>true</active>
  <seats>5</seats>
  <team>platform</team>
</user>

Drop those into a raw line diff and the very first line looks changed, because id and role swapped places. Format both, compare by meaning, and the real story is short:

What actually changed
NodeBeforeAfterChange
@roleeditoradminModified
seats35Modified
teamplatformAdded
@id77No change (just moved)
nameAda LovelaceAda LovelaceNo change

Three real edits: a role bump, a seat count, and a new team element. The attribute swap was noise. That promotion from editor to admin is exactly the kind of thing you want to catch in review, and it is easy to miss when it is buried under a line the diff wrongly flagged.

Canonical XML: the proper way to ignore noise

Formatting both sides handles indentation, but there is a standard built for exactly this problem. Canonical XML, defined by the W3C in Canonical XML 1.1, rewrites a document into a single normalized form: attributes sorted, empty elements expanded, whitespace in tags normalized, and default attributes made explicit. Two documents that are equivalent produce identical canonical output. It is the XML equivalent of sorting JSON keys.

xmllint --c14n old.xml > old.c14n.xml
xmllint --c14n new.xml > new.c14n.xml
diff old.c14n.xml new.c14n.xml

Now diff only reports content that truly changed, because both files have been normalized the same way. If you just want readable indentation instead of strict canonical form, xmllint --format file.xml pretty-prints it, which is the terminal equivalent of clicking Format in the browser.

Namespaces: the part that confuses everyone

XML namespaces let two documents use the same vocabulary with different prefix labels. <ns1:user> bound to a URI and <u:user> bound to the same URI are the same element; the prefix is just a local nickname. A text diff sees ns1 versus u and flags a change that is not one. The fix is to compare by the namespace URI rather than the prefix, which is precisely what canonicalization does. The Namespaces in XML spec is the reference if you need to settle an argument about it.

Common gotchas to watch for

GotchaWhy it bitesFix
Character encodingA UTF-8 and a UTF-16 file can hold the same text but differ byte for byteNormalize encoding; the XML declaration states it
Entity references&amp; and a literal & can both appear for the same characterCanonicalize, which resolves entities consistently
CDATA vs escaped text<![CDATA[a<b]]> and a&lt;b are the same text contentCompare the parsed value, not the raw bytes
Significant whitespaceInside xml:space="preserve", spaces matter and must not be strippedDo not blindly trim; respect xml:space
Self-closing tags<x/> and <x></x> are identicalCanonicalize so both render the same way

Text diff vs structural diff

Everything above is a text diff: fast, visual, and perfect for a person reading a change. A structural diff goes further and describes the change in terms of the XML tree: this attribute changed, that child element was inserted at this path. You want a structural diff when a program needs to apply the change or when element order genuinely does not matter and you want it ignored. For day-to-day review, a text diff of two formatted documents is plenty.

Related tools

XML is rarely the only format you deal with. If you are comparing API payloads, JSON compare applies the same idea to JSON. Marked-up pages are easier to read on the HTML compare page, and environment settings line up well on the config compare tool.

Frequently asked questions

Does comparing XML files online upload them anywhere?
On comparetext.org the diff runs in your browser. The two XML files are compared by JavaScript on your own machine, so nothing is sent to a server unless you explicitly click Save or Share. That makes it safe for config files, SOAP messages, and other data you would not want to paste into a site that uploads on every keystroke.
Why do my two XML files show every line as different?
Almost always it is formatting, not real changes. One file is minified or indented with tabs, the other with two spaces, or the attributes are in a different order. Click Format on both sides so they use the same indentation. After that the diff usually shrinks to the handful of values that genuinely changed. For a stricter normalization, canonicalize both files with xmllint --c14n first.
Does attribute order matter when comparing XML?
No. In XML the attributes on an element are an unordered set, so <a x="1" y="2"/> and <a y="2" x="1"/> are equivalent. A plain text diff does not know this and will flag the reorder as a change. Canonical XML sorts attributes into a fixed order, so canonicalizing both sides before comparing makes the false positive disappear. Element order, by contrast, usually is significant.
How do I compare XML while ignoring namespace prefixes?
Namespace prefixes are local labels for a namespace URI, so ns1:user and u:user bound to the same URI are the same element. To compare correctly, normalize by URI rather than by prefix. The simplest way is to canonicalize both documents with xmllint --c14n, which rewrites namespace bindings consistently, then diff the results. A raw text diff cannot do this on its own.
Can I compare large XML files without the page freezing?
Yes, up to a point. A line-mode diff stays fast on files with thousands of lines because it compares whole lines first instead of every character. Very large files (several megabytes) are better handled with a command-line tool like xmllint or git diff, which stream the data. For anything you can comfortably scroll through in a browser, an online diff is the quicker option.
What is the difference between a text diff and a structural diff of XML?
A text diff compares the files line by line, the same way it would compare two essays. A structural diff understands the XML tree, so it knows that a reordered attribute is not a change and can report an inserted element by its path. Text diffs are faster and good enough for most reviews once both sides are formatted. Structural diffs matter when a program needs to apply the change or when you want element order ignored.

Ready to try it? Paste your files into the XML compare tool and see what changed.