Groups | Search | Server Info | Login | Register


Groups > rocksolid.shared.tor > #64

Re: Very good research on detection of hidden services

From Anonymous <poster@anon.com>
Newsgroups rocksolid.shared.tor
Subject Re: Very good research on detection of hidden services
Date 2021-04-03 06:30 -0700
Organization rocksolid2 (novabbs.org)
Message-ID <to.484.4eihky@anon.com> (permalink)
References <to.482.199atu@anon.com>

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

I got curious now, so I coded a little spider script to discover onion sites. Underneath you can find the code, I also attach an archive in case it gets messed up. Current features of the spider:
-collects v2/v3 addresses of hidden services that host webservers
-collects all links found on any visited site
-single-threaded (because I am not after speed, and I don't want to ddos the tor network or any site)
-gives status update after each run (statistics)
-uses lynx for retrieval  of the sites and all the messy html parsing and extraction of the links (-dump -listonly)
-one run = one link only --> process can be interrupted at any time without data loss
-visits all valid http/https-links on onion domains, while trying to avoid any file download (images, pdfs, ...)
-uses a 2 step random choice algorithm to select the next link to visit, as a result pages with a lot of links have the same chance to get chosen as pages with one link only.

I might add other features later, like link chains (this would allow to make a mapping of the existing services and how they link to each other).
Of course, such programs already exist. and more sophisticated, faster, etc. This is more about learning, and about reaching the target with minimal resources. So, here it comes:

#!/bin/bash
###############################################################
################## v2/v3 scanner ##############################
# a little bash script to scan the darkweb for tor addresses
# adjust the proxy variables in /etc/lynx/lynx.cfg if needed
# than just call the script from a subdir, and let run
# you will need a file called "links.txt" containing onion
# links to start the script of (there are some popular 
# addresses preloaded).
# the script runs in an endless loop until there are no more
# links to be followed. you can stop anytime with CTRL+C
# without loosing anything but the currently
# running request. Another start of the script, and it will
# pickup where it was interrupted. all of this is
# EXPERIMENTAL CODE, USE AT OWN RISK
###############################################################
counter=0
touch ./visited_links.txt
while true
do
	counter=$(("$counter"+1))
	sort -u links.txt | grep -E 'http:\/\/[[:alnum:]]{16,56}\.onion.*' | sort > links_new.txt && mv links_new.txt links.txt
	visited_link_count=$(wc -l visited_links.txt | cut -f1 -d ' ')
	link_count=$(wc -l links.txt | cut -f1 -d ' ')
	service_count=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion' links.txt | sort -u | wc -l)
	visited_service_count=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion' visited_links.txt | sort -u | wc -l)
	echo -e " we have visited $visited_link_count links and discovered $visited_service_count onion addresses, we still have $link_count unique links on $service_count hosts\n\r"
	service=$(grep -E -o 'http:\/\/[[:alnum:]]{16,56}\.onion.*' links.txt | sort -u | sort -R | head -n 1)
	link=$(grep "$service" links.txt | sort -R | head -n 1)
	if [ -z $link ]; then
		echo "no more links, we are done"
		exit 0
	fi
	echo -e "trying to get $link in run number $counter,\n\r"
	echo "$link" >> visited_links.txt
	sort -u visited_links.txt | sort > visited_links_new.txt && mv visited_links_new.txt visited_links.txt
	comm -23 links.txt visited_links.txt > links_new.txt && mv links_new.txt links.txt
	lynx -dump -listonly $link | grep -E 'http:\/\/[[:alnum:]]{16,56}\.onion.*' | grep -E -i -v '.*\.jpg$|.*\.gif$|.*\.png$|.*\.pdf$|.*\.mp3$|.*\.m3u$|.*\.avi$|.*\.jpeg$|.*\.bmp$|.*\.mkv$' | cut -f2- -d '.' | cut -c2- >> links.txt
done

Content of the file links.txt (seed file):

http://tor66sewebgixwhcqfnp5inzp5x5uohhdy3kvtnyfxc2e5mxiuh34iid.onion/
http://suprbayoubiexnmp.onion/
http://3bbad7fauom4d6sgppalyqddsqbf5u5p56b5k5uk2zxsy3d6ey2jobad.onion/
http://tordexu73joywapk2txdr54jed4imqledpcvcuf75qsas2gwdgksvnyd.onion/
http://torchdeedp3i2jigzjdmfpn5ttjhthh5wbmda2rr3jvqjg5p77c54dqd.onion/
http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page
http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion/search?query=linklist

I might publish the addresses I found at one point.

Back to rocksolid.shared.tor | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-03-29 09:19 -0700
  Re: Very good research on detection of hidden services Retro Guy <retroguy@novabbs.com> - 2021-03-29 22:42 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-03 06:30 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 08:57 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 09:01 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-04 13:44 -0700
    Re: Very good research on detection of hidden services Retro Guy <retroguy@novabbs.com> - 2021-04-04 23:54 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-05 03:52 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-05 08:12 -0700
  Re: Very good research on detection of hidden services Anonymous <poster@anon.com> - 2021-04-06 13:28 -0700
  Final stats and address list Anonymous <poster@anon.com> - 2021-04-17 12:22 -0700
    Re: Final stats and address list Anonymous@news.novabbs.org (Anonymous) - 2021-04-18 00:52 +0000
  None Anonymous <poster@anon.com> - 2021-04-18 04:51 -0700

csiph-web