Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.os.linux.misc > #69793
| From | Eli the Bearded <*@eli.users.panix.com> |
|---|---|
| Newsgroups | comp.os.linux.misc |
| Subject | the "same image" problem |
| Date | 2025-07-20 19:58 +0000 |
| Organization | Some absurd concept |
| Message-ID | <eli$2507201536@qaz.wtf> (permalink) |
I have tens of thousands of photos, mostly mine, spanning decades. During that time there has been a lot of opportunity for images to get into my collection in different ways. Like, I take a photo, resize it to post on a website, then download an archive of my activity from that website a few years later and now I maybe have three copies of the image, each with different MD5/SHA1/whatever hash: 1. My original 2. My reized version 3. The re-encoded version from the website archive Or I have an image from a backup of my phone, which I then later changed the tags on, so the exif data differs. (These I can _usually_ identify by filename matches. But some have filenames too generic for that to work.) Or I have a physical photo that I have scanned from both a print and from the negative at different times. Or I have a photo I shared with family and then they sent it back a few years later as a reminder, each time it getting re-encoded. I would like a tool that can scan my collection and easily help me find visually similar images but which may be not exactly pixel for pixel identical, and for 100% sure are not byte for byte identical on disk. It's been about ten years since I last looked for such a tool and I wasn't really happy the ones for Linux back then. Best I remember was "Perceptual Hash" ( https://www.phash.org/ -- last release 2013 ). The output was a number, but it could compare images pairwise, which doesn't scale well. Anything people like these days? Elijah ------ has not tried using phash in a long time
Back to comp.os.linux.misc | Previous | Next — Next in thread | Find similar
the "same image" problem Eli the Bearded <*@eli.users.panix.com> - 2025-07-20 19:58 +0000 Re: the "same image" problem Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-07-20 23:38 +0000
csiph-web