Groups | Search | Server Info | Login | Register
Groups > rocksolid.shared.tor > #64
| From | Anonymous <poster@anon.com> |
|---|---|
| Newsgroups | rocksolid.shared.tor |
| Subject | Re: Very good research on detection of hidden services |
| Date | 2021-04-03 06:30 -0700 |
| Organization | rocksolid2 (novabbs.org) |
| Message-ID | <to.484.4eihky@anon.com> (permalink) |
| References | <to.482.199atu@anon.com> |
[Multipart message — attachments visible in raw view] - view raw
I got curious now, so I coded a little spider script to discover onion sites. Underneath you can find the code, I also attach an archive in case it gets messed up. Current features of the spider:
-collects v2/v3 addresses of hidden services that host webservers
-collects all links found on any visited site
-single-threaded (because I am not after speed, and I don't want to ddos the tor network or any site)
-gives status update after each run (statistics)
-uses lynx for retrieval of the sites and all the messy html parsing and extraction of the links (-dump -listonly)
-one run = one link only --> process can be interrupted at any time without data loss
-visits all valid http/https-links on onion domains, while trying to avoid any file download (images, pdfs, ...)
-uses a 2 step random choice algorithm to select the next link to visit, as a result pages with a lot of links have the same chance to get chosen as pages with one link only.
I might add other features later, like link chains (this would allow to make a mapping of the existing services and how they link to each other).
Of course, such programs already exist. and more sophisticated, faster, etc. This is more about learning, and about reaching the target with minimal resources. So, here it comes:
#!/bin/bash
###############################################################
################## v2/v3 scanner ##############################
# a little bash script to scan the darkweb for tor addresses
# adjust the proxy variables in /etc/lynx/lynx.cfg if needed
# than just call the script from a subdir, and let run
# you will need a file called "links.txt" containing onion
# links to start the script of (there are some popular
# addresses preloaded).
# the script runs in an endless loop until there are no more
# links to be followed. you can stop anytime with CTRL+C
# without loosing anything but the currently
# running request. Another start of the script, and it will
# pickup where it was interrupted. all of this is
# EXPERIMENTAL CODE, USE AT OWN RISK
###############################################################
counter=0
touch ./visited_links.txt
while true
do
counter=$(("$counter"+1))
sort -u links.txt | grep -E 'http:\/\/[[:alnum:]]{16,56}\.onion.*' | sort > links_new.txt && mv links_new.txt links.txt
visited_link_count=$(wc -l visited_links.txt | cut -f1 -d ' ')
link_count=$(wc -l links.txt | cut -f1 -d ' ')
service_count=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion' links.txt | sort -u | wc -l)
visited_service_count=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion' visited_links.txt | sort -u | wc -l)
echo -e " we have visited $visited_link_count links and discovered $visited_service_count onion addresses, we still have $link_count unique links on $service_count hosts\n\r"
service=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion.*' links.txt | sort -u | sort -R | head -n 1)
link=$(grep "$service" links.txt | sort -R | head -n 1)
if [ -z $link ]; then
echo "no more links, we are done"
exit 0
fi
echo -e "trying to get $link in run number $counter,\n\r"
echo "$link" >> visited_links.txt
sort -u visited_links.txt | sort > visited_links_new.txt && mv visited_links_new.txt visited_links.txt
comm -23 links.txt visited_links.txt > links_new.txt && mv links_new.txt links.txt
lynx -dump -listonly $link | grep -E 'http:\/\/[[:alnum:]]{16,56}\.onion.*' | grep -E -i -v '.*\.jpg$|.*\.gif$|.*\.png$|.*\.pdf$|.*\.mp3$|.*\.m3u$|.*\.avi$|.*\.jpeg$|.*\.bmp$|.*\.mkv$' | cut -f2- -d '.' | cut -c2- >> links.txt
done
Content of the file links.txt (seed file):
http://tor66sewebgixwhcqfnp5inzp5x5uohhdy3kvtnyfxc2e5mxiuh34iid.onion/
http://suprbayoubiexnmp.onion/
http://3bbad7fauom4d6sgppalyqddsqbf5u5p56b5k5uk2zxsy3d6ey2jobad.onion/
http://tordexu73joywapk2txdr54jed4imqledpcvcuf75qsas2gwdgksvnyd.onion/
http://torchdeedp3i2jigzjdmfpn5ttjhthh5wbmda2rr3jvqjg5p77c54dqd.onion/
http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page
http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion/search?query=linklist
I might publish the addresses I found at one point.
Back to rocksolid.shared.tor | Previous | Next — Previous in thread | Next in thread | Find similar
Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-03-29 09:19 -0700
Re: Very good research on detection of hidden services Retro Guy <retroguy@novabbs.com> - 2021-03-29 22:42 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-03 06:30 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 08:57 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 09:01 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 13:44 -0700
Re: Very good research on detection of hidden services Retro Guy <retroguy@novabbs.com> - 2021-04-04 23:54 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-05 03:52 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-05 08:12 -0700
Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-06 13:28 -0700
Final stats and address list Anonymous <poster@anon.com> - 2021-04-17 12:22 -0700
Re: Final stats and address list Anonymous@news.novabbs.org (Anonymous) - 2021-04-18 00:52 +0000
None Anonymous <poster@anon.com> - 2021-04-18 04:51 -0700
csiph-web