Imagine this -- you’re looking to harvest a bunch of cloud-storage user credentials through a phishing campaign. Your email is perfect, domains registered and now all you have left to do is make a copy of your target’s web page, modify the POST script and hit send. Within minutes, you are collected credentials and your campaign is a success.
This process may sound complicated, but it’s not. Attackers can easily copy your website and use it against you or your customers in phishing campaigns. As defenders, there’s little we can do to stop this, but using data analysis techniques, we can identify ways to surface potential abuse.
One of the new datasets we released last week with version two of our API was “trackers”. We define trackers as unique codes or values found within web pages that could be correlated back to a central entity. Our current tracker dataset includes IDs from providers like Google, Yandex, Mixpanel, New Relic, Clicky and is continuing to grow on a regular basis.
What makes this dataset powerful is the ability to link different websites on values that are often overlooked or unseen to the end user. If we take the case of our phisher example above, it’s likely they didn’t bother to remove the analytics details from their target’s web page, or better yet, maybe they replaced the code with their own. Either way, there’s a high potential to surface their activity using our new trackers dataset.
Trackers in the Web
Starting today, PassiveTotal users will now see and be able to pivot on tracker codes RiskIQ has collected using their years of web crawls. Similar to our other datasets, trackers will show up in a tab and include various details like the hostname, first seen, last seen, tracking type and tracking value.
Clicking on the tracker value will perform a reverse search for any other websites we have seen using those tracking codes along with if it's ever shown up on the RiskIQ blacklist. This is an incredibly powerful feature that is quickly able to surface malicious domains in a single click.
Knowing that some users strictly use the API or prefer the command line, we also built out a tool using our new Python libraries. Located within the examples folder for our library is a script called “tracker_sentinel.py” that automates a lot of the discovery a user needs to do in the web.
Using this dataset, we've found the easiest way to identify bad is to start with known good. Tracker sentinel was built with this in mind and tends to output the best results when starting with a known good site. For example, if we take something like "dropbox.com" and enter it into the tool, we will get the above. Behind the scenes, our script will query for all tracking codes associated with "dropbox.com", then use those codes to find all other properties that do not match that hostname and how it relates to the original query.
We are super excited about our new datasets and even more thrilled to begin bringing more automation around discovery to PassiveTotal. Be sure to subscribe to the Github repository for the latest updates and example tools we create using our API libraries.