The key to successfully combating phishing is detecting it early: the sooner you can report a phishing site to the hosting providers, the fewer people will fall victim to it. But you don't need any expensive services to do this, it's possible to build your own phishing detection system for free using open source tools.
How? By using Certificate Transparency logs.
Certificate Transparency (CT) logs are public databases of every HTTPS certificate issued by publicly-trusted Certificate Authorities (i.e. certificates that work in standard web browsers). They were originally introduced to address the issue of fraudulent or maliciously issued certificates, but have turned out to be extremely powerful for detecting phishing.
Because Chrome and Safari browsers will only trust an HTTPS certificate if it's been submitted to at least one CT log, all public HTTPS certificates get submitted to these logs as soon as they're issued. So, unlike newly registered domains lists which only get updated once a day, new domains appear in CT logs almost instantly. And they contain subdomains, not just root domain names.
Best of all, Certificate Transparency logs are completely open, so you can download their contents for free.
phishing_catcher
is an open source project which scans CT logs for domains containing multiple suspicious keywords.
In its default configuration, phishing_catcher
is set up to generically collect potential phishing domains using a scoring system based around generic security-themed keywords:
keywords:
# Generic suspicious
'login': 25
'log-in': 25
'sign-in': 25
'signin': 25
'account': 25
'verification': 25
'verify': 25
'webscr': 25
'password': 25
'credential': 25
'support': 25
'activity': 25
'security': 25
# ...
But it's easy to configure this scoring system and tailor the domains it detects to focus on your brand specifically.
phishing_catcher
to detect phishing for your brandphishing_catcher
is a Python project which needs to run either locally on your laptop or ideally on a server where it can run uninterrupted.
The first step to setting it up is cloning the repository from GitHub:
$ git clone https://github.com/x0rz/phishing_catcher.git
Now you'll have a folder named phishing_catcher
containing two key files:
catch_phishing.py
: the script that consumes the CT logs looking for domainssuspicious.yaml
: the keywords and associated scores used by the script for determining what is suspiciousIf you open suspicious.yaml
you'll see it contains a bunch of keywords that probably aren't relevant to your brand.
Instead, you'll want to replace it with a config containing keywords and scores in three categories:
keywords:
# Variations on your brand name (scored highly)
acmebank: 50
acme-bank: 50
acme: 50
# ...
# Industry-specific keywords
payment: 25
payee: 25
statement: 25
balance: 25
card: 25
# ...
# Generic keywords
login: 25
log-in: 25
sign-in: 25
signin: 25
account: 25
support: 25
# ...
Your keywords and scores will absolutely evolve over time as you detect phishing sites and learn what keywords are commonly used but these are a good starting point.
Bear in mind, phishing_catcher
will only notify you about a domain if the total score (by summing the relevant keyword scores) is greater than 65.
Running phishing_catcher
is simple. Either using Docker:
$ docker build . -t phishing_catcher
$ docker run phishing_catcher
Or directly on your system:
$ pip install -r requirements.txt
$ ./catch_phishing.py
You'll see a line telling you that it's established a connection to CertStream (the way it consumes Certificate Transparency logs) and then a subsequent line for each domain it detects.
phishing_catcher
is a great proof of concept, but on its own isn't a production-ready detection system:
Thankfully, the logic is very simple, so you can fairly easily adapt the catch_phishing.py
script, or even write your own.
The system described here can be extremely effective, but over time you'll likely find the performance weak in a couple areas:
Both of these can be solved by analysing the website actually being hosted on each domain, not just the domain name. If this is something you're struggling with, we'd love to chat about how our enterprise plan can help.