Researchers claim that the majority of the dark web exist to facilitate criminal activities, including drug trade, financial fraud, and illegal pornography. This article explores the different methods researchers have experimented with to help the identification and deanonimisation of the marketplace vendors of the dark web.
Dark web marketplaces like Tor2Door and White House specialise in drug trade. These marketplaces, just like legitimate websites, are connecting vendors and customers together on an eBay-like trading platform. As markets are running as a hidden service on the Tor network, Tor makes the de-anonymisation of the marketplace vendors, customers, and operators rather difficult.
In various research, analysts has experimented with a range of fingerprinting techniques (e.g. deep neural networks) to build uniquely identifying fingerprints of the dark web marketplace vendors. This fingerprint could help law enforcement with two things:
- Assess the scale of sales of a particular seller; and
- Identify the real name and whereabouts of the perpetrator.
The following report explores what fingerprinting methodologies researchers have tried. Also, what the accuracy of the various fingerprinting approaches is, how they compare to each other, and how the methods could be improved by further research.
Dark Vendor Profiling Framework
Researchers found that they can build a digital fingerprint based on a dark web vendor’s public profile characteristics. Although vendor fingerprinting alone does not necessarily reveal the real identity of a dark web vendor, it may allow researchers and law enforcement alike to tell if the same person controls two vendor accounts or not. This information can be helpful within the criminal prosecution process because it indicates how many vendor accounts were under the defendant’s control and the accurate scale of the defendant’s illegal activity.
In her research, Jeziorowski has developed a Dark Vendor Profiling (DVP) framework to help attribute product listings on dark web marketplaces. She argues that the combination of specific characteristics of a vendor can create a fingerprint uniquely identifying the trader.
This fingerprint, taken from one marketplace with the DVP framework, allows researchers to identify user accounts from other marketplaces belonging to the same person. In addition, researchers can link criminals with multiple user accounts (on the same market) together with DVP.
In summary, Jeziorowski suggests that the DVP framework can help link multiple user accounts on the dark web to the same individual. Due to the anonymous nature of the dark web, attribution is an essential procedure when it comes to the legal prosecution of criminals trading on the dark web. Therefore, Jeziorowski theorises that investigations (led by law enforcement and intelligence agencies) could benefit from the fingerprinting features of her DVP framework.
How the DVP Framework Operates
Jeziorowski relies on three main characteristics of a vendor profile for building the unique fingerprint with DVP: writing styles, product attributes and image-based features.
The writing style analysis – also known as stylometry – is a technique for authorship analysis allowing to determine whether the same person wrote a block of text or not. Jeziorowski relies on a dark web vendor’s product names and product descriptions in her text analysis process to determine authorship.
Secondly, Jeziorowski feeds DVP with a range of product attributes (e.g. base product name, product shipping details, product category and subcategory names). Lastly, image-based features like EXIF data are also fed into the DVP model.
Jeziorowski reports that these three main characteristics from above allowed her DVP framework to link vendor accounts together (across different dark web marketplaces) with a high level of accuracy of 88%.
The shortcoming of her vendor profiling methodology is the relative complexity of the DVP model. The model requires a high number of inputs to be retrieved, cleaned, normalised, and fed into the DVP framework for producing a good result.
Fingerprinting with Deep Learning Technologies
Wang et al. take image-based fingerprinting to the next level compared to Jeziorowski’s approach. Although, in comparison, Jeziorowski relies on metadata, such as the EXIF data, of the product listing images (among other inputs), Wang et al. analyses the product images with a range of deep learning-based techniques in order to profile dark web dealers.
The authors hypothesise that criminals can be uniquely identified by using the high-level photographic features of the product listing photos. In this case, Wang et al. point out that certain photo composition features, such as camera angles, scene backgrounds and latent photography styles, can help attribute the photographer of the product image. Combining these photographic properties can create a unique fingerprint of a dark web vendor, argues Wang et al.
Image Deep Analysis Accuracy
This photo analysis approach requires deep neural networks and an extensive training data set to work accurately. Therefore, Wang et al. had to rely on a public dataset featuring many photos scraped from the dark web for training their deep neural network for the research. Nevertheless, the large data set of images allowed Wang et al. to try and test different prediction models for accuracy.
In a nutshell, the researchers found that the ResNet deep neural network algorithm is the most accurate among neural network candidates. Wang et al. concluded that accuracy ranges between 87.10% and 93.20%. It means that the probability that the product photos of a dark web listing belong to the same vendor is very high. The excellent result allowed Wang et al. to identify sellers with duplicate accounts on the same dark web marketplace or the same vendor across different dark web marketplaces.
The authors also found that the cross-market detection accuracy and the intra-market detection accuracy range between 87% and 90%. This number is similar to Jeziorowski’s DVP framework (88%), which relied on a broader range of data set than the Wang et al. method.
Writing Style Analysis
Finally, the authors compared the accuracy of the deep neural network algorithm of the product photographs with a stylometry analysis-based approach. Like Jeziorowski, the authors had attempted to establish the authorship of the product descriptions by analysing the writing style of the dark web merchant. Wang et al. found that the accuracy of their stylometry approach ranges between 58.60% and 99.00% depending on the attributes of the product listing.
In conclusion, Wang et al. concluded that the image-based approach provides a higher level of accuracy than the writing style analysis method. Furthermore, Wang et al. found that the processing time of the image analysis is shorter than what the stylometric approach requires.
Vendor Profiling with Text Analysis
In his research, Chen also explores the stylometry approach and claims that stylometry can characterise the author. Chen suggests that the writing style can imply the gender, education level, and cultural background.
The researcher compares various analysis techniques in his book and finds a range of 88.00% to 97.00% accuracy to predict authorship with the support vector machines (SVM) approach. He also builds a model for predicting the author’s gender and finds that the model can predict the gender with a 59.70% to 86.90% accuracy, depending on several variables.
Although text analysis can reach high accuracy in ideal scenarios, the stylometry approach fails under certain circumstances. The limitation of Chen’s research in the context of marketplace vendors is that his results originate from the analysis of extremist content on the dark web rather than illegal marketplaces. Although illegal marketplaces and extremist communities both rely on the dark web as a platform, Chen’s text analysis approach needs further research due to the differences between the two types of textual content.
First, none of the reviewed literature delved into fingerprinting techniques other than the stylometric analysis of writing styles and image analysis. In my opinion, criminals have other characteristics allowing researchers to profile them with a higher level of accuracy than the different approaches. My theory is that the more attributes are fed into the dark web profiling algorithms, the better the vendor profiling accuracy becomes.
For example, written customer feedback could be a valuable data source to profile a dark web vendor. Just like legitimate e-commerce sites like Amazon and eBay, dark web marketplaces also allow customers to rate a vendor publicly after every purchase. Customers can rate dealers on the dark web marketplaces with a five-star system and provide written feedback. I assume that the star rating and the written customer feedback can enrich the dark web marketplace vendor fingerprinting models.
Fingerprinting Based on PGP Keys
The other characteristic that the researchers could have explored further is the analysis of the vendors’ PGP keys. Dark web marketplaces usually require the vendors to publish the public part of their PGP key so that the customers can encrypt messages meant for the seller.
I hypothesise that dark web vendors tend to reuse their PGP keys across marketplaces to simplify their key management practices. Therefore, public PGP keys could help identify related accounts as long as the same person uses the same PGP key across different marketplaces.
Links to the Surface Web
Secondly, none of the authors explored whether the fingerprint can identify related user accounts on the surface web or not. For instance, dark web vendors are likely to have user accounts on legitimate marketplaces (e.g. eBay, Gumtree).
Most surface web marketplaces work like their dark web counterparts as the seller also needs to create a user profile, write a short bio of themselves, and publish photos and descriptions of the products for sale. These textual details on the surface web could also be fed into a fingerprinting algorithm.
My theory is that the real identity of dark web merchants can potentially be revealed by creating a fingerprint of the user accounts on the surface web and the dark web. The fingerprints could be compared with each other, and if there is a match, the dark web account is likely to be controlled by the same person as the surface web counterpart.
I believe the most significant finding of the papers was the excellent level of accuracy of the dark web vendor fingerprints. Each dark vendor profiling approach was able to identify multiple dark web vendor user accounts with high confidence. Usually, writing style analysis works well when both the training data and the text sample is long. Nonetheless, all authors achieved a high level of accuracy despite the ample product descriptions and the low amount of textual content written by the vendor.
Another surprising finding was how little difference a more complex image analysis procedure made. While Jeziorowski relied on the noble approach of EXIF metadata analysis, Wang et al. chose a range of complex deep learning algorithms for the fingerprinting process. Although the latter is much more sophisticated than Jeziorowski’s, the accuracy of the deep learning method was not significantly higher than Jeziorowski’s naïve method.
How the Law Enforcement Can Benefit
I reckon that a specific combination of the various vendor fingerprinting methodologies can be indispensable for law enforcement investigations. First, vendor fingerprinting allows prosecution in a court case to point out the extent of the illegal drug trade on the dark web. For example, a defendant may claim that she had a presence on a single dark web marketplace with only a handful of illegal transactions. Still, dark web profiling can reveal whether the defendant had controlled multiple user accounts in an attempt to hide the accurate scale of the marketplace activity.
Furthermore, fingerprinting can potentially help law enforcement at the investigation stage, in my opinion. Investigators can fingerprint the dark web vendor first and then look for people with a similar digital fingerprint but on the surface web. For instance, if a dark web vendor decides to sell products on eBay, she may use the same camera for taking photos of her legitimate products as for taking pictures of the illegal drugs. As Wang et al. pointed out, the characteristics of a product photo such as the camera angles, scene backgrounds and latent photography styles can be pretty revealing. Therefore, if the dark web vendor uses the same camera, investigators may find the dark web vendor on the surface web by looking for any public photos with the same characteristics.
In the above papers, researchers demonstrated that the characteristics of a dark web vendor user profile could uniquely identify a vendor on the dark web. Firstly, this unique fingerprint can pinpoint vendors with multiple accounts in the same market. Secondly, the fingerprint can identify vendor accounts on different marketplaces belonging to the same person. The ability to link user accounts to the same person could be helpful in the prosecution process because it can demonstrate the precise scale of the illegal trade associated with the defendant.
Research suggests that the accuracy of the dark web vendor profiling algorithms is generally high. Although different fingerprinting methodologies exist with varying levels of assurance, researchers found that the accuracy can be as high as 99.00% under the right circumstances.
This article was based on my essay written for a course at Charles Sturt University. Cover photo by Benjamin Lehman.