RU against the Machine: The failures of machine translation for analysing Russian-language forums

Introduction

Russian slang such as ‘fenya’ and ‘mat’ and cybercriminal-specific dialect is used throughout hidden services, especially on forums. As with any other language, non-fluent researchers, journalists, analysts, and other forum members may need to translate Russian content to interact with the service. When using machine translation (MTL), the use of slang terms may result in missed content and context within forum posts.

Whilst cybercriminal forums exhibit a vast amount of conversation, where these terms can most likely be found, mistranslations are also observed throughout forums. English-language content is also written by non-English individuals through MTL, resulting in signs of non-fluency. A key example of this is the LockBit Ransomware-as-a-Service affiliate rules, in which the word ‘rhubarb’ is used where ‘revenue’ was originally intended.

This blog aims to investigate the failures of MTL in the context of Russian-language cybercrime and explore the benefits of knowing common slang terms when navigating forums and creating actionable intelligence.

Cybercriminal locations

What are cybercriminal forums?

A cybercriminal forum is a marketplace and place of discussion which is often hosted on the dark web. There is often a keen focus on cybercrime within these forums. They are places where cybercriminals can discuss and develop tactics, techniques, and procedures (TTPs), sell and buy data leaks, accesses, Proof of Concept (PoC) exploits, and more. Forum members can also engage in more technical discussions regarding the development of malicious tools. Forums are often split into sections that are dedicated to these areas of the threat landscape. As seen in Figure 1, there are sections on this forum dedicated towards malware, social engineering, AI, cryptography, and more.

Figure 1 – Landing page of a prominent Russian-language cybercriminal forum.

Some of the most prominent sites have been active for decades. For example, the two most prominent forums, XSS and Exploit, have been running since 2013 and 2005, respectively. This diverse and active portion of the threat landscape contains vital actionable intelligence with global affects, meaning there are major benefits to understanding the native language. This benefit particularly comes from understanding community terms and common slang phrases used throughout forums.

Russian-language cybercriminal locations, particularly forums such as XSS and Exploit, are often invite only. Similarly, other forums ask that users pay to register and may have private sections only visible to certain members. Extended further, some forums have hidden post content from all but a select few members. These forums have strict moderation, laying out comprehensive rules which often result in punishment for users who break them. Scammers, leachers, and those using poor grammar and punctuation can expect to receive warnings and even bans from the staff team.

Data-leak sites

Extortion groups commonly use DLSs to further extort victims, typically proceeding in multiple stages. The first threat is that the victim’s name and news of a successful attack against it will be published on the extortion group’s website. Should this fail to motivate a victim to pay a ransom, the group’s next step is typically to provide proof of the successful theft of its data. This can include screenshots of internal file trees, samples of employee or customer PII, or other sensitive documents. The group may add a countdown at this stage, noting that should the victim fail to pay by the conclusion, it will make available to DLS visitors all stolen data, either for free or at cost.

Figure 2 – Landing page of ransomware group Akira’s DLS.

Why is translation software unreliable?

The main issue with machine translation is that it lacks the ability to interpret slang, and contextual and cultural references. These algorithms often conduct literal translations word-for-word, with limited contextual translation other than for grammatical reasons. Slang and community terms are often not translated or meaning is incorrectly attributed, leading to misunderstandings, mistranslations, and potentially entirely false results.

This report aims to explore these community phrases and commonly mistranslated slang used in the context of cybercrime. It will also explore how MTL is unable to contribute to translation of these terms and explore a case study on how cybercriminals use these types of services.

Common slang terms on Russian-language hidden services

General cybercriminal forum usage

лс

The term ‘лс’ is used throughout most, if not all sections on cybercriminal forums. This term refers to private messaging and is often used to state a user’s contact preferences. In some initial access broker (IAB) listings, where threat actors sell initial access to organisations, the seller may invite negotiations over private messaging. This will often be included in the price area. When using Google Translate, this term simply translates to “ls”. In the correct context, it is worth noting that this does occasionally translate as “PM”, or private message. Another common Cyrillic version of this is shown as ‘ПМ’ (PM). The term ‘в лс’ (v ls) means that, for example in Figure 3 a buyer should contact the seller via private message to discuss the sale.

Figure 3 – Russian-language IAB listing containing term ‘в лс’.

гарант

The phrase ‘гарант’ (garant) refers to the escrow services many prominent forums have. This system allows moderators and administrators to act as a middleman between a buyer and seller, ensuring the product is transferred and legitimate. Additionally, it also ensures that funds are successfully released. This term through MTL is noted as “guarantee” or “guarantor”. Whilst this may be an acceptable translation; the knowledge of forum escrow services is vital in context. Sellers on cybercriminal forums may work only through an escrow service to increase their credibility and ensure that they are not scammed.

Figure 4 shows the seller stating, “I will give access in advance to people with reputation or with a deposit, the rest strictly through the guarantor”. This means that the user does not trust buyers without reputation or deposits on the forum.

Figure 4 – Russian-language IAB listing containing information about escrow service usage.

Мусор

Мусор (musor), directly translated to “garbage”, refers to the police. The common English-language version of this phrase is “fed”. Being accused of being a member of law enforcement could result in major reputation loss, bans from cybercriminal locations, and more retaliatory measures from other threat actors.

Figure 5 depicts an arbitration thread against prolific ransomware group LockBit. Possibly out of emotional escalation, the thread starter claimed that LockBit was a member of the Russian FSB and law enforcement. An excerpt from the chat logs shared as part of this thread include a quote from the original poster which reads “С мусором никто работать не будет!”. Machine translates this quote to “nobody will work with garbage!”. However, in this context, it is more likely that the poster is using the word as a derogatory term for law enforcement. In either instance, the word is meant as an insult. However, the latter has larger repercussions for a user’s presence on the forum.

Figure 5 – Screenshot of arbitration thread against LockBit.

Figure 6 – Excerpt from chat logs included in arbitration thread against LockBit.

Terms used in cybercriminal context

к, кк, and ккк

The terms ‘к’, ‘кк’, and ‘ккк’ are used in quantitative terms both in Cyrillic and English. They are often used when describing large numbers, such as those in the millions. Figure 7 is an IAB listing which advertises access to a range of organisations, stating the revenue for each. Similarly, ‘ккк’ or ‘kkk’ may be used to describe a revenue in the billions. This knowledge can aid in identifying victim organisations in IAB listings.

Figure 8 – Cybercriminal forum post containing multiple accesses.

Авер

The word ‘Авер’ (Aver) refers to antivirus. A specific use of this can be found within the IAB market, where it is used to state what antivirus solution is active on the compromised network or host. An anglicised version of this term exists as “Aver”, which is used in the same context. Use of the anglicised term may indicate that a Russian-language user is attempting to masquerade as an English speaker, though there may also be a wider adoption of the term. Figure 9 depicts an IAB listing with Sophos antivirus active on the compromised host, specifically “авер софос” (aver sophos).

Figure 9 – Russian-language IAB listings stating the active antivirus software.

Benefits of knowing slang terms

Foreign language translation can be vital to transform first-party research into readable, actionable intelligence. It also aids in understanding threat actor communication, which can help tracking and attribution. This translation provides insight for sock puppet requirements to masquerade as a legitimate forum user.

Russian-language knowledge for closed source intelligence

In particular, arbitration and general discussion sections may include slang and other community terms. Monitoring can result in insights into not only cybercrime and criminality within these forums, but also into cybercriminals themselves, their use of language, and how community terms have become adopted.

As explored earlier in this report, knowing Russian terms when conducting analysis on cybercriminal forums can aid in assessing potential victims. In the IAB market, the vast majority of listings include the victim organisation’s revenue, which is denoted with ‘к’, ‘кк’, and ‘ккк’. By understanding this, it may be possible to attribute the listing to the victim organisation. Through notification to the victim, it may also prevent the malicious use of such access.

Attribution of language from literal mistranslated slang

In some cases, a literal translation occurs and leads to incorrect words in the given context. For example, LockBit’s affiliate rules state “as well as any other organizations provided that they are private and have rhubarb”. This is almost certainly a mistranslation of the Russian word ‘ревеню’ (revenyu). Meaning rhubarb in literal translation, the word actually phonetically sounds like the English word ‘revenue’.

Figure 10 – Excerpt from LockBit’s affiliate rules.

In the instance a Russian-language individual attempts to sound English or write content in the languge, mistranslations can aid in attributing the poster to a non-English denomination. In this case, LockBit directly translated and wrote “rhubarb” in its affiliate rules. From this single word, it becomes clear that the rules were most likely machine translated into English from Russian. This is confirmed at the bottom of the page, which explains that it was translated via Google translate. As such, this resulted in the error. From this, however, it is possible to attribute a threat group to a specific geography, which may aid in subsequent tracking and investigation.

Figure 11 – Excerpt from LockBit’s affiliate rules.

Conclusion

Written language can be viewed as a way to monitor threat groups and potentially lead to attribution. As a result, threat actors may choose to write in different languages to provide a level of operational security. However, if not fluent in the language, they may use machine translation software that can have notable identifiable issues with slang and community terms.

For analysts active on cybercriminal locations, understanding slang is vital to gain complete understanding of cybercriminal discussions. Additionally, it also aids in maintaining a reputable and coherent facade if masquerading as a Russian criminal.

Overall, MTL cannot solely be relied upon when analysing Russian-language threat intelligence. This should especially be considered in the context of conversational language used on cybercriminal forums and similar areas of the threat landscape.

Cyjax is proud to have an analyst team which not only make use a variety of translation tools but also has a broad knowledge of languages including Russian. Through this, Cyjax is able to gain a deeper insight into the world of cybercrime and analyse the relevant threats to your business.

To learn more about what Cyjax can do for you, request a demo here.

Receive our latest cyber intelligence insights delivered directly to your inbox

Simply complete the form to subscribe to our newsletter, ensuring you stay informed about the latest cyber intelligence insights and news.