Introducing the IDA Pro Translator Plugin - kyrus/ida-translator GitHub Wiki

During malware analysis, it’s common to find error messages and other strings embedded in the binary. These strings provide a shortcut to understanding the meaning of a section of code, or potentially provide attribution to an entity or individual. Many times, these strings are in English. However, there are many examples of malware samples where the messages may be encoded in a non-ASCII encoding. Unfortunately, IDA Pro has limited support for strings other than ASCII encodings. Newer versions of IDA Pro contain limited support for strings encoded in UTF-8 and UTF-16, but most scripts will not display correctly with the default font. Encodings other than UTF-8 and UTF-16 aren't supported at all (for more information on character encodings and Unicode, see my talk You Don’t Know ʞɔɐɾ About Unicode).

The Old Way

You can cut & paste the bytes out of IDA (with the assistance of a small IDC script), then paste them into Python’s chardet module to determine the probable character encoding (if not already known). Then you can decode the character encoding in Python, convert back to Unicode code points, and print out the original string back to the console. This of course is not possible in the default Windows command prompt console since it doesn’t handle Unicode characters, so either you write it to a file which you can then open in Notepad or perform this step on a Unix box.

Once you’ve done all that, you can finally copy and paste the string into Google Translate and get an English translation for the original string. That seemed like an awful waste of time when you're sitting in front of a multi-core, multi-gigahertz machine waiting patiently for you to cut and paste items from one window to another. Wouldn't it be nice to have the machine do all the work for you?

A Sample Program

In order to demonstrate the translator plugin, here is a sample program which includes strings encoded in a variety of character encodings. The text is taken from the UTF-8 sampler page at http://www.columbia.edu/~fdc/utf8/ and are translations of the phrase "I can eat glass and it doesn't hurt me" in various languages.

#include <stdio.h>

int main (int argc, char **argv)
{
    // samples from http://www.columbia.edu/~fdc/utf8/
    // Russian in KOI8-R: Я могу есть стекло, оно мне не вредит.
    printf ("\xf1 \xcd\xcf\xc7\xd5 \xc5\xd3\xd4\xd8 \xd3\xd4\xc5\xcb\xcc\xcf"
        ", \xcf\xce\xcf \xcd\xce\xc5 \xce\xc5 \xd7\xd2\xc5\xc4\xc9\xd4.\n");

    // Chinese in gb2312: 我能吞下玻璃而不伤身体。
    printf ("\xce\xd2\xc4\xdc\xcd\xcc\xcf\xc2\xb2\xa3\xc1\xa7\xb6\xf8\xb2"
        "\xbb\xc9\xcb\xc9\xed\xcc\xe5\xa1\xa3\n");

    // Japanese in shift-jis: 私はガラスを食べられます。それは私を傷つけません。
    printf ("\x8e\x84\x82\xcd\x83""K\x83\x89\x83X\x82\xf0\x90""H\x82\xd7"
        "\x82\xe7\x82\xea\x82\xdc\x82\xb7\x81""B\x82\xbb\x82\xea\x82\xcd"
        "\x8e\x84\x82\xf0\x8f\x9d\x82\xc2\x82\xaf\x82\xdc\x82\xb9\x82\xf1\x81""B\n");

    // Korean in euc-kr:  나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
    printf ("\xb3\xaa\xb4\xc2 \xc0\xaf\xb8\xae\xb8\xa6 \xb8\xd4\xc0\xbb "
        "\xbc\xf6 \xc0\xd6\xbe\xee\xbf\xe4. \xb1\xd7\xb7\xa1\xb5\xb5 "
        "\xbe\xc6\xc7\xc1\xc1\xf6 \xbe\xca\xbe\xc6\xbf\xe4\n");

    // Hebrew in windows-1255: אני יכול לאכול זכוכית וזה לא מזיק לי.
    printf ("\xe0\xf0\xe9 \xe9\xeb\xe5\xec \xec\xe0\xeb\xe5\xec "
        "\xe6\xeb\xe5\xeb\xe9\xfa \xe5\xe6\xe4 \xec\xe0 \xee\xe6\xe9\xf7 "
        "\xec\xe9.\n");

    // Belarusian in windows-1251: Я магу есці шкло, яно мне не шкодзіць.
    printf ("\xdf \xec\xe0\xe3\xf3 \xe5\xf1\xf6\xb3 \xf8\xea\xeb\xee, "
        "\xff\xed\xee \xec\xed\xe5 \xed\xe5 \xf8\xea\xee\xe4\xe7\xb3\xf6\xfc.\n");

    // Thai in tis-620:  ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
    printf ("\xa9\xd1\xb9\xa1\xd4\xb9\xa1\xc3\xd0\xa8\xa1\xe4\xb4\xe9 "
        "\xe1\xb5\xe8\xc1\xd1\xb9\xe4\xc1\xe8\xb7\xd3\xe3\xcb\xe9\xa9"
        "\xd1\xb9\xe0\xa8\xe7\xb\n");

    // Hungarian in windows-1250: Meg tudom enni az üveget, nem lesz tőle bajom.
    printf ("Meg tudom enni az \xfc""veget, nem lesz t\xf5""le bajom.\n");

    // Icelandic in utf-8: Ég get etið gler án þess að meiða mig.
    printf ("\xc3\x89""g get eti\xc3\xb0 gler \xc3\xa1n \xc3\xbe"
        "ess a\xc3\xb0 mei\xc3\xb0""a mig.\n");
}

The IDA Pro Translator Plugin

The IDA Pro translator plugin automates both steps of the process: (1) determining the character encoding of a given string, and (2) translating that string to English. In addition, it keeps track of all the translations you’ve performed on an IDB file for easy export (say, to a human language expert).

For example, taking the sample program above and putting it into IDA gives the following output for the first string:

Not too helpful. Hit Control-R and up pops the translation dialog. Note that, in many cases, IDA Pro tries really hard to make "strings" out of byte sequences in the data section. The default algorithm in the plugin is to stop the string once a new "defined" item (ie. a named entity) is encountered. Therefore, be sure to undefine any errant strings before hitting the hotkey, or highlight the extent of the string manually before hitting Control-R.

Now you can see the result:

Repeating for each translation, you will see that the Translations dockable window will contain all of the translations that you've created for this IDB file:

⚠️ **GitHub.com Fallback** ⚠️