Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.misc > #69793

the "same image" problem

Path csiph.com!weretis.net!feeder9.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From Eli the Bearded <*@eli.users.panix.com>
Newsgroups comp.os.linux.misc
Subject the "same image" problem
Date Sun, 20 Jul 2025 19:58:30 -0000 (UTC)
Organization Some absurd concept
Message-ID <eli$2507201536@qaz.wtf> (permalink)
Injection-Date Sun, 20 Jul 2025 19:58:30 -0000 (UTC)
Injection-Info reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5"; logging-data="6397"; mail-complaints-to="abuse@panix.com"
User-Agent Vectrex rn 2.1 (beta)
X-Liz It's actually happened, the entire Internet is a massive game of Redcode
X-Motto "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress Moronic Fucks.
X-Attribution EtB
XFrom is a real address
Encrypted double rot-13
Xref csiph.com comp.os.linux.misc:69793

Show key headers only | View raw


I have tens of thousands of photos, mostly mine, spanning decades.
During that time there has been a lot of opportunity for images to get
into my collection in different ways. Like, I take a photo, resize it
to post on a website, then download an archive of my activity from that
website a few years later and now I maybe have three copies of the
image, each with different MD5/SHA1/whatever hash:

	1. My original
	2. My reized version
	3. The re-encoded version from the website archive

Or I have an image from a backup of my phone, which I then later
changed the tags on, so the exif data differs. (These I can _usually_
identify by filename matches. But some have filenames too generic
for that to work.)

Or I have a physical photo that I have scanned from both a print and
from the negative at different times. 

Or I have a photo I shared with family and then they sent it back
a few years later as a reminder, each time it getting re-encoded.

I would like a tool that can scan my collection and easily help me find
visually similar images but which may be not exactly pixel for pixel
identical, and for 100% sure are not byte for byte identical on disk.

It's been about ten years since I last looked for such a tool and I
wasn't really happy the ones for Linux back then. Best I remember was
"Perceptual Hash" ( https://www.phash.org/ -- last release 2013 ). The
output was a number, but it could compare images pairwise, which doesn't
scale well.

Anything people like these days?

Elijah
------
has not tried using phash in a long time

Back to comp.os.linux.misc | Previous | NextNext in thread | Find similar


Thread

the "same image" problem Eli the Bearded <*@eli.users.panix.com> - 2025-07-20 19:58 +0000
  Re: the "same image" problem Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-07-20 23:38 +0000

csiph-web