Koert KuipersAugust 31 2021

The Issue

The most simplistic way to go about discovering bad actors is just to check if your potential customer is on a list of known bad actors. Obviously, bad actors catch on to that process, and they try to have people or companies stand in for them, so that they can go on with their business without being discovered. And to make this harder to detect, they will try to create multiple degrees of separation between them and this company or person that is standing in for them. Now if you think about that problem, it sounds relatively simple, right? Because all that has to be done is to simply trace back from the person you’re dealing with to the potential bad actor that is hiding behind it. But this is a network problem. And it quickly gets very unwieldy, because before you know it, these steps between the person or company you’re dealing with and the bad actor can easily become 2, 3, 4, 5, 6 or more hops. More levels of indirection, which becomes almost infeasible for a human being to uncover by looking through the network. And even for a computer, this can become very expensive to do up to the point where it becomes also computationally infeasible. And that is the problem we wanted to solve: to uncover these relationships, despite the fact that it’s very difficult to do.

The Magnitude of the Problem

So first, it is helpful to realize how quickly you’re building up these levels of indirection (or hops). A very simple example would be somebody who wants to open a bank account. But as it turns out, he or she is married to somebody who is an officer of a company, that is a subsidiary of another company, where the owner is a known bad actor. So effectively, you’re dealing with somebody married to an employee of a known bad actor! But if you take this apart, logically, it is quite a complex multi-hop relationship. So the first step was a marriage, the second step was officer-of, the third step was subsidiary, and the fourth step was owner. That’s already a four hop relationship right there. And this can quickly get much larger: it’s not uncommon to see a seven hop relation that is nonetheless credible and worthy of investigation. And now think about this as a network, and let’s assume in this network each entity has 10 relations. That’s not an unreasonable proposal, in reality it can be more. If you now need to go out 10 hops, and at each hop you need to inspect these 10 relations, then you’re quickly talking about inspecting billions (or actually tens of billions) of potential entities in this network. For a human being this is just plain impossible to do, it would take more than a lifetime. And for most computer programs this is not realistic to do either. 

How We Solve This

As mentioned earlier, to solve the problem of finding bad actors correctly means going beyond just checking if a potential client is a bad actor: it involves uncovering these bad actors trying to hide behind these levels of indirection or these hidden relationships. To do that, you need to solve two very difficult problems.

The first problem is building (or finding) all these hidden relations and combining them with the more obvious (non-hidden) relations. So you need to build this enormous collection of relations. The obvious relations will be for example based on corporate data. The hidden relations are much harder to uncover, like multiple records for the same person (where it wasn’t necessarily the person’s intention this was uncovered), or people living on the same address, or people being related. You could take this to an extreme, like a relation for people who are childhood friends, or people who live on the same block. So there’s all kinds of relations. The first challenge is building or uncovering all these relations and then having all that information in one place.

The second challenge once you have all the information in one place is exploiting it, to find these hidden or indirect multi-hop relationships between bad actors and people with whom you might be considering doing business. This is a computational challenge, where terabytes of data have to be processed to find the Kevin-Bacon like multi-hop connections.

Digging in a Little

Because data is public does not mean the relationships that are in the data are easily available. The relationship might be hidden in there, but you still have to bring them out. Even just integrating leaks data, like Panama Papers, with corporate registry data is non-trivial. And then obviously, the more subtle your relations become (childhood friends), the harder it is to actually uncover them in the data. And then on top of that, you also have to realize that even if the direct (one-hop) relationships are not hidden, even if they were all public and explicitly available (which they’re not), that multi-hop relations can be hiding in plain sight, because when you need to go out these hops, the data quickly becomes enormous, and it’s very difficult to uncover these multi-hop relations. Because of this, a bad actor can hide in plain sight behind the enormity of the amount of data that needs to be investigated as one moves through these layers of indirection.

More about Tresata

What we have discovered in the last ten years is that almost all problems we were asked to help solve with our software came down to understanding what an entity is, and what its context is. This consists roughly of two steps. First is understanding what records unambiguously belong to each entity, and turning that knowledge into a rich description of the entity itself. Second is understanding how all these entities relate to each other, which results in an understanding of an entity within the context of all other entities that relate to it.

The number one capability that we built in the last 10 years was scalable record linkage. And what we’re now doing is we’re taking what we’ve learned in record linkage, and applying it to understanding these enormous networks of relations, or possible relations, that tie entities together – this enormous network. And on top of this network we are building algorithms that exploit this information. Internally, we call the algorithms that do this harvesters. The goal of the harvesters is to learn things about these entities and their relationships with each other and then turn this into useful actionable knowledge for applications like bad actor detection, fraud prevention, and prospecting.

To learn more about how Tresata’s Digital Business Engine can help transform your business, @ us at

we use cookies to improve your experience and ensure its correct functionality as detailed in our privacy policy. by continuing to browse or by clicking ‘accept’, you agree to storing of cookies on your device.