Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.misc > #69793

the "same image" problem

From Eli the Bearded <*@eli.users.panix.com>
Newsgroups comp.os.linux.misc
Subject the "same image" problem
Date 2025-07-20 19:58 +0000
Organization Some absurd concept
Message-ID <eli$2507201536@qaz.wtf> (permalink)

Show all headers | View raw


I have tens of thousands of photos, mostly mine, spanning decades.
During that time there has been a lot of opportunity for images to get
into my collection in different ways. Like, I take a photo, resize it
to post on a website, then download an archive of my activity from that
website a few years later and now I maybe have three copies of the
image, each with different MD5/SHA1/whatever hash:

	1. My original
	2. My reized version
	3. The re-encoded version from the website archive

Or I have an image from a backup of my phone, which I then later
changed the tags on, so the exif data differs. (These I can _usually_
identify by filename matches. But some have filenames too generic
for that to work.)

Or I have a physical photo that I have scanned from both a print and
from the negative at different times. 

Or I have a photo I shared with family and then they sent it back
a few years later as a reminder, each time it getting re-encoded.

I would like a tool that can scan my collection and easily help me find
visually similar images but which may be not exactly pixel for pixel
identical, and for 100% sure are not byte for byte identical on disk.

It's been about ten years since I last looked for such a tool and I
wasn't really happy the ones for Linux back then. Best I remember was
"Perceptual Hash" ( https://www.phash.org/ -- last release 2013 ). The
output was a number, but it could compare images pairwise, which doesn't
scale well.

Anything people like these days?

Elijah
------
has not tried using phash in a long time

Back to comp.os.linux.misc | Previous | NextNext in thread | Find similar


Thread

the "same image" problem Eli the Bearded <*@eli.users.panix.com> - 2025-07-20 19:58 +0000
  Re: the "same image" problem Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-07-20 23:38 +0000

csiph-web