urlscan.io is an incredible tool for taking a snapshot of a phishing website. It doesn't just take a screenshot of the page, but also captures all the resources loaded and requests made by the page. This data is then accessible through the search feature.
Because of its generous free tier many people use urlscan.io and so there's a wealth of data to search through and find phishing sites in.
So you want to find phishing targeting a specific brand. There are some open source phishing detection tools you could set up, but what about historical phishing sites? urlscan.io's powerful search functionality has you covered.
It's extremely common for phishing sites to contain the name of the brand they're impersonating.
You can find domains containing a brand name (but which aren't the brand's legitimate domain) using a query like this:
page.domain:(/.*brandName.*/ AND NOT brand.com AND NOT brand.org)
This query returns results where the page domain matches the regular expression
.*brandName.* (i.e. contain "brandName" somewhere) and the domain is not one of the listed legitimate domains.
For example, looking for TikTok-related phishing sites we could use the query:
Depending on how a phishing site is constructed, it might load assets (like logos) from the real website.
You can find sites loading legitimate assets using this query:
domain:brand.com AND NOT page.domain:brand.com
This returns sites where:
This technique works particularly well if a brand's assets are hosted in something like an S3 bucket.
Once you've found one phishing site, it's useful to find other instances of that same phishing kit. urlscan.io makes pivoting between similar sites a breeze. It's often actually easier to pivot between instances of a phishing kit than to find the first one!
There's many ways to pivot between phishing sites but here's the methods that work with urlscan.io
urlscan.io has a built-in feature which tries to identify "structurally similar" websites to the one you've scanned. This feature is experimental and the quality of results varies, but as it only takes a single click, it's worth trying.
To view structurally similar sites:
A more involved method (but usually giving better results) is to search urlscan.io for results loading identical files as the phishing site you've already identified. Because phishing sites are generally deployed over and over again from the same "kit", there'll be files in common.
On the "HTTP" tab you can see all the requests made by the site:
Here you'll need a little intuition to pick a good filename for pivoting. You want something which will be named the same across different instances of this phishing kit (so anything dynamic should be excluded), but you also want to rule out anything too generic which will be seen across lots of different, unrelated sites. Definitely avoid:
saba9m.JPG seems like a good choice for pivoting.
Clicking on the filename takes you straight to the search results for that filename and we see our intuition was right as there's a bunch of other malicious sites loading this file:
You can do your own filename searches using a query like this:
Sometimes there's no good filenames you can use for pivoting.
Perhaps a site only loads generically named assets (
or maybe the assets are embedded in the HTML itself so there are no other requests.
In this case, you can try pivoting on the hashes of responses. Pivoting on hashes is extremely specific: you'll only get results where the site loaded a file with identical contents to the site you're interested in. But, this method is more brittle than pivoting on filenames: if the phisher has updated the kit at all, the hashes will change, and you'll miss these sites in the results. This also means you can't pivot on the hash of anything where the response is dynamic (for example, if there's a random challenge included in every page load).
Search by file-hash using a query like:
urlscan.io's search results aren't just accessible through the website, you can also get them via the API.
To get search results using the API, fetch the URL:
And you'll get an array of results to use as input into whatever automation you like. At Phish Report, we feed these search results into our IOK engine for detecting known phishing kits.