0
Words
0
Characters
0
No spaces
0
Lines
0
Paragraphs
0
Sentences
0 min
Reading time
Lines with duplicates
Deduped output

Remove Duplicate Lines: Dedupe Text Online

Paste any list, strip exact duplicates, keep the order or sort the result. Case and whitespace toggles included.

What this tool does

A free, in-browser utility for removing duplicate lines from a single block of text. Paste your input, pick whether to preserve the original order or sort the output, and the duplicates are gone. Nothing leaves your machine.

It is the GUI equivalent of sort -u on the command line, or awk '!seen[$0]++' when you want to keep the first occurrence in original order. If you have ever pasted a 4,000-line email export into a terminal just to run sort | uniq, this is the same thing without the terminal.

Dedup is exact and line-based. Two lines have to match character for character (subject to your case and trim toggles) to count as duplicates. Fuzzy matching, near-duplicate detection, and similarity scoring are deliberately out of scope. Different problem, different tool.

How dedup actually works here

Internally the tool walks each line once and tracks seen values in a hash-backed set, the same data structure as JavaScript's Set or Python's set. Lookups are O(1) on average, so a million lines run in under a second on a normal laptop. The underlying algorithm is plain hash-table membership.

Two modes. Preserve order keeps the first occurrence of each line and drops every later duplicate, the way awk '!seen[$0]++' behaves. Sort and dedup sorts the whole input alphabetically first and emits each unique line once, the way sort -u behaves. Pick whichever your downstream consumer expects.

There are toggles for the comparison itself. Case-insensitive treats [email protected] and [email protected] as the same line. Trim whitespace strips leading and trailing spaces and tabs before comparing, so " example", "example ", and "example" all collapse to one entry. Both are off by default because the safe behaviour is to match exactly what was typed.

How to remove duplicate lines in three steps

One pane in, one pane out. Nothing is uploaded.

  1. 1

    Paste your lines

    Drop the text into the input pane. Any list works: emails, IPs, URLs, SKUs, log timestamps, package names, anything one-per-line. Click Sample to load an example with intentional duplicates if you want to see the result first.

  2. 2

    Pick your options

    Choose Preserve order (keeps the first occurrence, like awk '!seen[$0]++') or Sort and dedup (alphabetic, like sort -u). Toggle Case-insensitive if Alice and alice should collapse, and Trim whitespace if leading or trailing spaces should be ignored.

  3. 3

    Copy or download the result

    The right pane shows the deduped output and a count of how many lines were removed. Click Copy to grab it, or Download to save it as a .txt file. Both panes scroll independently so you can spot which lines went missing.

When you want this tool

Cleaning a marketing email list before import

Export your list from one CRM, splice in another from a webinar signup, paste both into the tool, dedupe with Case-insensitive and Trim whitespace on. Mailchimp and HubSpot reject duplicate addresses on import anyway, but they reject the whole batch instead of just the dupes, so doing this first saves a re-upload.

Deduping IP addresses in an audit log

Pull the source IPs from a week of auth.log with awk, paste them in, dedupe in preserve-order mode. You get a clean list of unique attackers without losing the rough chronological signal of "who showed up first". Feed that into your firewall block list.

Cleaning a list of URLs scraped from multiple pages

Scraping a sitemap and a paginated index will return the same URL multiple times. Drop the combined list in, sort and dedupe, and you have a canonical crawl frontier. Trailing slashes and query strings still count as different lines, so normalise those first if you want them merged.

Reducing pip freeze or npm ls noise to unique packages

pip freeze across two virtualenvs, or npm ls --all across a monorepo, prints the same package on many lines. Concatenate, paste, dedupe, and you have one line per name==version for a quick eyeball of what is actually installed.

CSV rows pasted from multiple sources

Works for line-level dedup, which is what you usually need when each row is a self-contained record. Heads up: this is plain text dedup, not column-aware. If you want true row-level dedup that respects quoted commas and ignores column order, use a real CSV tool. For most pasted-from-spreadsheet cases, line dedup is enough.

Cleaning a copied bibliography or citation list

Copying references from three browser tabs into one document leaves you with the same DOI repeated four times. Paste in, turn on Trim whitespace (because copy-paste from PDFs loves to add stray spaces), dedupe, and you get a clean references section. Sort mode is handy here for alphabetical reference lists.

Dedup quick reference

The edge cases that bite most often when deduplicating text. Worth scanning once before you trust the output.

TopicWhat this tool does
Order preservationPreserve-order keeps the first occurrence and drops later dupes. Like awk '!seen[$0]++' or Python list(dict.fromkeys(lines)). Sort mode is alphabetic, like sort -u.
Case sensitivityOff by default. Alice and alice are distinct unless Case-insensitive is on. Email and username lists usually want it on; SKU lists usually do not.
Whitespace trimmingOff by default. " example", "example ", and "example" are three different lines until Trim whitespace is on. Internal spaces are never touched.
Blank linesTreated as a normal line value: empty string. With dedup on, you keep one blank line if any existed in the input. To strip every blank line, use a separate whitespace-cleaner step.
Line endings (CRLF vs LF)A line ending in \r\n is technically different from one ending in \n if the trailing \r survives splitting. We split on \r?\n, so mixed endings collapse. If you see ghost duplicates, your input has stray \r characters mid-line.
Unicode normalisationTwo visually identical strings can compare unequal if one uses NFC and the other NFD (precomposed vs decomposed accents). This tool does no normalisation. If you suspect this, normalise both inputs to NFC first with String.prototype.normalize('NFC') or Python unicodedata.normalize.
Trailing newline at end of inputA file that ends with \n has a phantom empty last line. Most editors hide this. We treat it as one blank line, deduped with any other blank lines. Output preserves a single trailing newline by convention.
EncodingUTF-8 throughout. The browser handles decoding when you paste; if your bytes were Latin-1 or Windows-1252 originally, characters outside ASCII may be wrong before the dedup even runs. Convert encoding upstream.

Remove duplicate lines: frequently asked questions

Does this preserve the original order or sort the output?

Both, your choice. Preserve order keeps the first occurrence of each line and drops every later duplicate, so the output reads in roughly the same order as the input. Sort and dedup sorts the whole result alphabetically. Preserve order is what you want when the order carries meaning (chronological logs, ranked lists). Sort is what you want when you just need a clean unique set.

Is the comparison case-sensitive?

By default yes, because that is the safe assumption. Alice and alice are different lines unless you turn on Case-insensitive. Most email systems treat addresses as case-insensitive on the local part, so for email lists you almost always want this toggle on. Same goes for usernames on case-insensitive platforms. For SKUs and identifiers that are genuinely case-sensitive, leave it off.

Can it ignore leading and trailing whitespace?

Yes, with Trim whitespace. It strips leading and trailing spaces and tabs before comparing, so " example ", "example ", and "example" all collapse into a single entry. Useful when your input was hand-edited or copy-pasted from a PDF, both of which leave stray spaces. Internal whitespace inside a line is left alone.

How is this different from the unix uniq command?

The uniq command only collapses adjacent duplicates, which is a common surprise. Two identical lines separated by a different line will both survive uniq. That is why the unix idiom is sort | uniq or sort -u: you have to sort first so duplicates are next to each other. This tool does not need a sorted input, because it tracks every seen line in a hash set as it goes.

Does it count how many duplicates each line had?

No, that is a different feature. If you need counts, the unix command for it is uniq -c after a sort: sort input.txt | uniq -c | sort -rn gives you a frequency table sorted by count, which is what you want when finding the most common entries. This tool focuses on cleanup, not analysis.

How big an input can I paste?

A few million lines is fine. The dedup itself is O(n) with constant-time hash lookups, so the bottleneck is the browser rendering the result pane, not the dedup. For very large inputs (50 MB+), expect a noticeable pause when the textarea repaints. If your input is that big, you almost certainly already have sort -u available; use it.

Privacy and how this works

Your text never leaves your browser. The dedup runs on your machine, locally, against a JavaScript Set. No analytics on your input, no logs, no cloud round-trip. The whole tool is a few lines of code: split on newlines, walk through, keep what we have not seen. The standard library primitives behind it are documented at MDN's Set reference and the equivalent Python set docs.