Chris Bugg

6 - unhidden

(2021-11-01)

After the great feeling that was getting spelunkTor published and off my plate, I decided it was time I started wrapping up my current band of Tor research so I can focus on another project that's been needing my attention.

Thus today I've published unhidden! unhidden is a tool to de-anonymize Onion Services (previously called Hidden Services, hence the name). Onion Services serve the purpose of anonymizing the Server in a Client-Server Model, so de-anonymization is quite bad. I'll do my best to explain the motives for this research, the general idea of how it works, and maybe where we can go from here.

Motive Operandi

I love Tor. I love the Tor Project. I think the whole concept of a community-driven anonymity project that has incredible scale and effectiveness is super cool. I must have first read some version of the Tor Whitepaper a decade or more ago and I've been fascinated ever since. No other problem-space is quite the same. Tor is software and networking that must function against (essentially) every form of malicious actor from your snoopin' neighbor to the NSA. Tor is under constant attack from criminal groups seeking to DOS rival drug markets, from the largest tech giants blocking armies of bots, and from international law-enforcement shutting down sites and distributing malware to their users. All this is being combated by a vibrant community of developers, activists, and lawyers in the name of freedom! What's not to love?

Due to the above it's very important that Tor stays safe and secure for everyone. Part of this means it's essential for research on Tor to continue and progress. That's what this is, research. A year ago I came up with a hypothetical attack that would be much easier and lower cost than the commonly held standard of timing attacks. Because it was just an idea I didn't know if it would actually work or if there were some clever mechanism already in Tor that would prevent it. So I set out to do a ton of research into timing attacks in Tor and solutions for them, then went deep into the specification to see if anything in there would prove this unachievable. What I found was that this approach appeared to be possible, so I set out to prove it, thereby adding to the research and ensuring folks have a better sense of how easy this type of attack is.

The Attack

In short, this attack is an opportunistic de-anonymization of Onion Services through traffic correlation with control of a single Guard Relay. Breaking this out:

Opportunistic: We can't target any specific Onion Services, we can only attack ones that happen to choose us as their Guard.
De-Anonymization: We take an Onion Address and attempt to correlate it to an IP Address.
Traffic Correlation: Technically, Traffic Analysis, we use our positions in the network to correlate network traffic to perform the de-anonymization.
Single Relay: Most Traffic Analysis attacks rely on controlling at least two nodes (or being a giant ASN). By using only one this attack is much easier, cheaper, and more practical.
Guard Relay: This is the first node in the 3-hop chain and has a trusted position as it can see the real IP of any computers connecting through it to the network. In context of an Onion Service, there is no Exit Relay so Guard Relays become the only place real IPs are known to the network.

In greater detail, any low-latency anonymity network with Hidden Services (used as a general term) will likely end up with key design features that enable this attack to some degree. Namely, to connect to these networks some node will need to be the entry point, and this entry point will naturally know the Hidden Services real IP. Then on the other side, any Hidden Service will be reachable by any client with the Hidden Service identifier and sufficient connectivity. With these features, an attacker can completely control two ends of the network (the client connecting and the entry node for the server) and can thus correlate the network traffic to match Hidden Service identifiers (Onion Addresses in this case) with real IPs. The fragility of this attack is due to: the attacker must know the Hidden Service identifier of a specific target and they must also be chosen by the target as the targets' entry point. These weaknesses ensure this attack (on a sufficiently diverse network) remains opportunistic, though none-the-less problematic for unlucky Hidden Services.

The Tool

unhidden is a program designed to live alongside a Guard Relay, generate, and collect the correlations between Onion Addresses and IPs. Its operation can be described as such:

Take a list of Onion Addresses (file) and put them in memory
Spawn a logger sub-process which logs all new connections to the relay
Spawn a visitor sub-process which is given an Onion Address and instructed to establish a connection with it
Wait for visitor to finish
Stop logger and gathers all logged IPs
Saves Onion Address and IPs to a database

Although a simple structure the results obtained are quite powerful. By filtering out all other relays and any existing connections we know the results we see are only Tor clients which are newly connecting to our relay during the short window. By repeating this many times over a long time period (weeks/months) we can establish reasonable suspicion of the correlations.

Causation

This attack and tool do not prove causation. Although the results obtained can be quite accurate and powerful they can only be used as a spotlight, not as direct evidence. This bad in the frame of trying to de-anonymize an 'anonymous' server because although it can lead to actual evidence (Law Enforcement imaging a server based on reasonable suspicion) it can't be simply handed to a prosecutor as 'proof'. However this is very good in the frame of keeping that anonymity because it means Tor works! As I mentioned earlier there's not much the Tor Project can do to stop this or 'fix' this. This is simply an unwanted extra property of a low-latency network like Tor. I2P does not fix this and from what I've seen, traffic correlation attacks in general are much easier because of its 'everyone is a node' design. The best thing we can do is the hardest, grow the network. The stronger the network grows the more difficult the correlation will be.

This has been a long journey for me and I'm glad I could contribute more to the research around Tor. Although I already have ideas about how to modify these ideas to be more interesting, I'm hoping to take a nice break from this for awhile so I can focus on something completely different!

Happy Hacking!
- Chris