Quick guide: demystifying the use of breach data

We explore a few common data breach search sites such as DeHashed and IntelligenceX, and discuss how they can be used to great effect to improve digital security and privacy.

SMU

Feb 13, 2021

Data is the commodity of the 21st century, and cyber-crime pays — especially when it offers access to countless billions of stolen or leaked records, many of which hold promise for additional nefarious activity (we had to use that word at least once).

There are a number of websites offering targeted searches of this breach data. We explore these sites and how they can be used to greatly impact your understanding of risk based on exposure of your personal information online.

The best part about all this? It’s completely licit, publicly accessible information. So jump on in to learn more about how you can use breach data to understand where you’re vulnerable online.

Okay so what’s the deal with this stuff?

Breach data is awesome, data breaches are not. But the general guiding principle here *not legal advice* (we have two attorneys on call but prefer they do other stuff) is that since any information that may have once been private often finds its way into a publicly accessible forum online, it becomes legal and acceptable to use within certain use contexts.

And for our purposes here, that context is to use this publicly available information for security research purposes — or to ensure your digital security and privacy are protected.

We want to tap into the sources that compile as much of this breach data as possible from across the deep, dark, and open web, and leverage it for our personal digital security and privacy.

How is this possible? As virtually everything moves online, more and more companies, governments, financial institutions, and others are struggling to properly and securely store the user data with which they are entrusted (and not simply due to a lack of effort). Unfortunately, a breach is not a question of if but rather when. This applies to pretty much everyone: Adobe, Equifax, Facebook, LinkedIn, eBay, Yahoo, the U.S. Government Office of Personnel Management, and countless others.

We want to tap into the sources that compile as much of this breach data as possible from across the deep, dark, and open web, and leverage it for our personal digital security and privacy.

Cast into the deep (web)

Much like climbing, there are many paths to the top of the mountain. Some are better than others. We don’t have too much preference here as, ideally, we’d like to build our own version of this data over time that can be referenced offline and without relying on a third party, however trusted. But as it stands, our primary references today are IntelligenceX and DeHashed. Others such as Snusbase or WeLeakInfo are certainly options, but won’t be introduced in detail.

We really enjoy IntelligenceX (known as IntelX) as a service due to its extensive records count (over 34.5 billion), easy user interface, transparent privacy policy, and generous search options. These data search sites have made querying breach data as easy as a Google-like interface on their websites. And the number of unique selectors you can search for offers a wide net with which to cast into the deep.

Most important (for us at least) are searches yielding email addresses, phone numbers, credentials (i.e. plaintext passwords), addresses, and other personal data of the sensitive sort. Often, these personal selectors will yield passwords that either tell us more about a subject or allow us to find other accounts owned by the same subject (please don’t reuse your passwords). For digital security and privacy work, this gives us a sense of what’s exposed and where.

Were this a criminal enterprise of ill repute, that unencrypted password may be the very key we need to access your email account, establish a covert forwarding rule for your inbox, and have unfettered real-time access to your future communications. Or it could mean access to your social media account, for which we would happily change your password to a more complicated one that only we know. Unless, of course, you were willing to pay our fee to release it to you. And these options only grow with the number of exposed records. The longer the account has been active, the greater exposure it faces.

You’ll note the general sources from which IntelX (and other similar sites) obtains their data. It is not uncommon to find links to torrents with gigabytes or terabytes’ worth of data available for downloading. It might take a few days, but this data is just a few literal clicks away.

Just the other day, for example, we obtained several gigabytes’ worth of completely public voter registration records from a number of U.S. states — complete with names, home addresses, political party, emails, and phone numbers — an amazing starting point for any online investigation (or stalker, criminal, crazy ex, etc.). Pair that with tools that allow us to accurately discover the true drivers license number of that individual, and any adversary holds significantly powerful information in their hands.

Thanks, aggregated publicly accessible information.

Gentlemen do not (?) read each other’s email

A quick note on email searches — as a personal asset of sorts normally tied to a specific individual, and with an entire history of personal communications associated with it — understanding the exposure of one’s email address is paramount for a sound digital security and privacy posture.

Enter Have I Been Pwned (“HIBP” as it’s known), the easiest and rather informative search-based website that checks your email address(es) against its archive of billions of records in order to determine your exposure.

The primarily valuable aspect of HIBP is the precise identification of which specific breaches in which your email address was located. While HIBP does not provide specific credentials, knowing which accounts were breached (and therefore what data was exposed) offers immediate feedback of the possible extent of the breach damage, what information is most at risk, and where you can focus your resources.

Climbing your analytical mountain

As you pick a route and begin ascending the mountain of enlightenment (tried pretty hard for that one), it is critical to note that accessing breach data is rarely the problem. Now, occasionally a subject will simply not possess much of a digital signature online, similar to what online investigations firm Bellingcat experienced when researching the movements of Russian spies in Europe. Occasionally, old accounts are shut down and new anonymous ones with little history or connections are encountered. Sometimes, people wise up to the power of publicly available information and how it can be used against them. And when this occurs, another intelligence discipline is required to progress further.

More often than not, however, the difficulty lies in analyzing the vast quantities of data available to us. This remains a core theme of the intelligence business, which — despite seeing rapid technological advances across every intelligence discipline — still requires human minds to process and make useful the available data in the most effective manner possible.

Because we’re running with a great trend (2-0) right now, this is a shameless plug for the investigative services we offer. You’re not paying for the searches — you’re paying for the years of analytical, investigative, and intelligence experience that’s gathering, processing, analyzing, and (hopefully) making the data useful to you.

Help me understand my exposure

At some point, we may highlight what an online investigation generally looks like as we trace out and analyze the connections between various records available to us. Piecing together addresses, dates of birth, drivers license numbers, voter registration records, other public records, IP addresses, etc. usually paints quite the picture. For now, let’s explore the mechanics of a targeted search for a personal email address.

Some data in the wild

We were once contacted by an individual who thought their email account may have been “hacked” and required some guidance on how to respond. Thankfully, this person reached out to us and knew to be wary of phishing attacks, to not reuse their passwords, and other digital security basics. They sent us this screenshot of the “automated” email they had received, which indicated a most certainly unauthorized party — ostensibly located in Bosnia and Herzegovina — had attempted to access their mail account, which triggered an alert to their inbox.

Screenshot submitted by the affected individual alerting them to the unauthorized attempt to access their account.

Let’s do a quick analysis. First, the fact criminals had attempted to access this individual’s account was significant because Yahoo had experienced a series of breaches comprising the compromise of roughly three billion usernames and passwords over a several year period around 2016. Years later, and we’re still seeing the ripple effects of such breaches. We could then infer this unauthorized login attempt was a direct result of this individual’s credentials being compromised in a Yahoo-related breach, but wanted to confirm.

Enter HIBP, which offered a simple yet informative picture of where this individual’s credentials had been observed across no fewer than ten separate breaches. As we can see below, this account had several years’ worth of history, and had suffered through ten instances of compromise across the various services used by the individual, including: Chegg, Exactis, Under Armour’s MyFitnessPal, tumblr, and others.

Screenshot from HIBP highlighting the number of breaches in which the individual’s account was observed.

Now, occasionally, an affected individual will exclaim that they had never used one of the services in which their data was observed in a breach. While unfortunate, we can assume that this person’s email address was likely sold by any number of services they used, and included in a marketing list of some kind that was purchased or obtained by the service that suffered the breach. If you didn’t give them your email, yet they have it, there aren’t too many other options for attributing its origin.

We had a decent understanding of the scope of compromise for this individual’s account, and then jumped into DeHashed to explore specific credentials that were exposed. Sure enough, search results yielded several plaintext (unencrypted) instances of this individual’s username and password, compiled from the ten breaches in which their data was observed.

DeHashed search result exemplar displaying an individual’s personal email address, username, hashed password (plaintext also available), address, and I.P. address — a significant amount of sensitive information in its composite.

Again, were this a criminal enterprise of ill repute, this treasure trove of personal data is now pleasantly available at zero cost — to us. For you, however, this is significant leverage with which we can hold you, your family, or your business at risk. Depending on your threat model, risk tolerance, and desire to defend yourself, this risk could pose significant damage to your reputation, physical safety, financial security, or other form of what we’d call critical assets.

As we alluded to above, the fun part joins analytical and investigative powers with the publicly available data. The search yields an email address with a plaintext password; we take the password and search for it in order to uncover additional possible reused passwords linked to previously unknown accounts; these new accounts associate to hidden social media accounts which yields additional details about a private personal life, etc.

These threads are pulled as far as they can, up until conditions are met to action them in a manner that achieves our end-state. For us today, that is finding security in understanding how you and your data are potentially at risk, and then implementing a series of fundamental digital security and privacy measures that can significantly reduce the risk posed from such exposure. Get the guide here and get ahead of the compromise.

Go forth in privacy and digital security.

Enjoy this piece? Share us with friends, because life is better with them. You rock!

Share Privacy Matters

Mason

Feb 24, 2021

So it is legal, in that I used the tools you mentioned (been awhile since I used "pwnd") to check, and found one breach and one paste (not surprising, I did get an alert and changed to even more complex passwords, as well as other mechanisms). So legal to see if I have been the victim of illegal exposure of private information? Illegal if someone uses it to steal my ID? Odd juxtaposition!

However, great article, and thanks for the security reminders!

Expand full comment

1 reply by SMU

1 more comment...

Privacy Matters

Discussion about this post

Ready for more?