Cover image

Threat hunting for phishing sites with urlscan.io

Bradley's author profile picture
Bradley Kemp on

urlscan.io is an incredible tool for taking a snapshot of a phishing website. It doesn't just take a screenshot of the page, but also captures all the resources loaded and requests made by the page. This data is then accessible through the search feature.

Because of its generous free tier many people use urlscan.io and so there's a wealth of data to search through and find phishing sites in.

Finding an initial phishing site

So you want to find phishing targeting a specific brand. There are some open source phishing detection tools you could set up, but what about historical phishing sites? urlscan.io's powerful search functionality has you covered.

Find domains containing a brand name

It's extremely common for phishing sites to contain the name of the brand they're impersonating.

You can find domains containing a brand name (but which aren't the brand's legitimate domain) using a query like this:

page.domain:(/.*brandName.*/ AND NOT brand.com AND NOT brand.org)

This query returns results where the page domain matches the regular expression .*brandName.* (i.e. contain "brandName" somewhere) and the domain is not one of the listed legitimate domains.

For example, looking for TikTok-related phishing sites we could use the query:

page.domain:(/.*tiktok.*/ AND NOT tiktok.com AND NOT www.tiktok.com AND NOT shop.tiktok.com)

Search results for domains containing
Search results for domains containing "tiktok"

Find sites hotlinking legitimate assets

Depending on how a phishing site is constructed, it might load assets (like logos) from the real website.

You can find sites loading legitimate assets using this query:

domain:brand.com AND NOT page.domain:brand.com

This returns sites where:

  • At least one of the requests was to brand.com
  • The domain of the overall page wasn't brand.com

This technique works particularly well if a brand's assets are hosted in something like an S3 bucket.

Pivoting from one phishing site to others

Once you've found one phishing site, it's useful to find other instances of that same phishing kit. urlscan.io makes pivoting between similar sites a breeze. It's often actually easier to pivot between instances of a phishing kit than to find the first one!

There's many ways to pivot between phishing sites but here's the methods that work with urlscan.io

Structural Similarity

urlscan.io has a built-in feature which tries to identify "structurally similar" websites to the one you've scanned. This feature is experimental and the quality of results varies, but as it only takes a single click, it's worth trying.

To view structurally similar sites:

  1. Go to the results page for a scan. For example: https://urlscan.io/result/80af8124-5097-4b82-acde-b4a06cb4962a/.
  2. Click on the "Similar" tab (it'll have an orange badge on it telling you how many matches there are)
  3. Look for the "structurally similar hits" section (if there are no results, this section will be empty)
Similarity results for an example urlscan.io result
Similarity results for an example urlscan.io result

Pivoting on filename

A more involved method (but usually giving better results) is to search urlscan.io for results loading identical files as the phishing site you've already identified. Because phishing sites are generally deployed over and over again from the same "kit", there'll be files in common.

On the "HTTP" tab you can see all the requests made by the site:

HTTP requests made by an example urlscan.io result
HTTP requests made by an example urlscan.io result

Here you'll need a little intuition to pick a good filename for pivoting. You want something which will be named the same across different instances of this phishing kit (so anything dynamic should be excluded), but you also want to rule out anything too generic which will be seen across lots of different, unrelated sites. Definitely avoid:

  • Open source javascript libraries
  • Generic filenames (like logo.svg)

Here saba9m.JPG seems like a good choice for pivoting. Clicking on the filename takes you straight to the search results for that filename and we see our intuition was right as there's a bunch of other malicious sites loading this file:

Search results for sites requesting saba9m.JPG
Search results for sites requesting saba9m.JPG

You can do your own filename searches using a query like this:

filename:"name.extension"

Pivoting on response hash

Sometimes there's no good filenames you can use for pivoting. Perhaps a site only loads generically named assets (style.css, logo.svg, etc.) or maybe the assets are embedded in the HTML itself so there are no other requests.

In this case, you can try pivoting on the hashes of responses. Pivoting on hashes is extremely specific: you'll only get results where the site loaded a file with identical contents to the site you're interested in. But, this method is more brittle than pivoting on filenames: if the phisher has updated the kit at all, the hashes will change, and you'll miss these sites in the results. This also means you can't pivot on the hash of anything where the response is dynamic (for example, if there's a random challenge included in every page load).

Search by file-hash using a query like:

hash:SHA256_hash_of_a_file

Getting search results through the API

urlscan.io's search results aren't just accessible through the website, you can also get them via the API.

To get search results using the API, fetch the URL:

https://urlscan.io/api/v1/search/?q=<your query>

And you'll get an array of results to use as input into whatever automation you like. At Phish Report, we feed these search results into our IOK engine for detecting known phishing kits.

More posts from the Phish Report team

Cover image

Open Source Intelligence (OSINT) from common link shorteners

Phishers love to use URL shorteners, but this can actually be a benefit for defenders too. Wouldn'...
Cover image

How to find and download phishing kits

Phishing kits are generally sold on underground forums and can be tricky (at least ethically!) for...
Cover image

Detecting phishing sites with high-entropy strings

You'd expect phishing sites to be hard to detect and track, but actually, many of them contain HTM...