Unlocking Memories: Extracting QQ NT Chat History from Encrypted Databases
Problem Formulation
Memories are precious presents for human. While most people keep their important material things in a carefully selected place, few manage to preserve their digital properties attentively.
An example could be Tencent QQ. This Chinese online real-time chat program has been popular among the community since decades ago. However, there are still no good ways to store the data people generated on the platform. Currently, to completely master the private data, exporting them from the program database is crucial.
Unfortunately, there are no official interfaces to export chat history from the program, which is ridiculous and irresponsible. If one does not strictly follow the data transfer method provided, the data may be permanently lost.
It is one’s own liberty to thoroughly control the private data, so this blog will alleviate the situation where enterprises scorn a person’s private data.
Data File Structure
QQ NT implements a brand new data structure and refactors newly constructed frameworks that utilize modern hardware “effectively”.
⚠️ This article is written on macOS Sonoma 14.4, so all the environments in this article are based on macOS. Please refer to the article with your own appropriate modifications.
Basically, the data files of a single user are located in /Users/gilbert/Library/Containers/com.tencent.qq/Data/Library/Application Support/QQ/nt_qq_{MD5} where {MD5} is a lowercase 32-character MD5 string. For example, the folder may named nt_qq_9f59fe26a5631419b727d23340f009b0.
In this folder there are three subdirectories: nt_data, nt_db, nt_temp. nt_data stores unencrypted media files, including videos, emojis, pictures, audios, etc. It can be easily backed up and integrated with a decrypted database. nt_temp just keeps some libvips thumbnails.
What we really need to pay attention to is nt_db, which stores all the db files. It has the following files.
As it shows, the chat history is stored in db files. Different file names indicate their various usages. In fact, they share the same inner structure, but what we need to stress on first is nt_msg.db. It takes most of the disk space and holds the most important records: the user’s chat history.
Some bytes are replaced with ?? to avoid possible leaks of privacy. It is apparent that real useful data start at offset 0x400h. There are 0x400h or 1024 bytes having nothing to do with the db content. So we can call it a “redundant SQLite header”. This header tells us that it is created by an internal fork of SQLite. Considering that the company just wants to waste what may actually leak more privacy (all the unencrypted media in nt_data) and makes what people really want to read and preserve tougher to handle (use a self-defined file structure and tedious encryption of db files), the file must be encrypted in some way.
But let’s just truncate the 1024-byte header beforehand by:
The company will never tell you about how to decrypt the database, but people require this. Luckily, there are cool works done by some greater hackers at qq-win-db-key.
In hacker’s perspective, the program runtime will finally get the plain text passphrase and store it in the program’s memory space. So disassembly is necessary to analyze the decryption process. Good news is that QQ NT has a cross-platform Electron base, which utilizes the same (backend) code on all desktop (Windows, macOS, Linux) and mobile (Android, iOS) platforms and all supported architectures (AMD64, ARM64, LoongArch, MIPS).
Let’s take macOS & ARM64 (the latest Macintosh hardware) as an example. The software developers usually build a universal binary (single executable for both Intel and Apple Silicon) for their distributions. We just focus on its ARM64 or aarch64 binary part.
Normal developers may refer to the original guide to operate on a more friendly GUI (Hopper Disassembler). Here we use the console utility objdump. Before doing any further step, open the terminal and cd to your favorite directory. The function to decrypt the databases are labeled nt_sqlite3_key_v2, which is derived from SQLCipher’s sqlite3_key_v2. Let’s find who calls it.
In ARM(v8) instruction, b stands for “branch unconditionally”. It transfers control to the instruction at the given address. Let’s listen to ChatGPT further explaining this piece of disassembly.
The part of the disassembly appears to be the prologue of a function. In assembly language, a function prologue typically consists of instructions that set up the function’s stack frame, save callee-saved registers, and perform other necessary setup before executing the main body of the function.
Here’s what each instruction in the prologue appears to be doing:
b 0x232cb50: This instruction branches unconditionally to the address 0x232cb50, which likely marks the beginning of the function.
sub sp, sp, #64: This instruction subtracts 64 from the stack pointer (sp), likely to reserve space on the stack for local variables or other function-specific data.
stp x22, x21, [sp, #16], stp x20, x19, [sp, #32], stp x29, x30, [sp, #48]: These instructions store the contents of specific registers onto the stack. This is often done to preserve the values of callee-saved registers across function calls.
add x29, sp, #48: This instruction sets up the frame pointer (x29) to point to the current stack frame, likely for easier access to local variables within the function.
mov instructions: These instructions move values into various registers (x19, x20, x21, x22, etc.). These could be setting up parameters for the function or initializing local variables.
stp x0, x1, [sp]: This instruction stores the contents of registers x0 and x1 onto the stack, possibly as part of function parameter passing or saving values for later use.
Overall, this part of the disassembly is setting up the function’s environment, including allocating space on the stack, saving necessary registers, and initializing variables, in preparation for the execution of the function’s main body.
From the explanation from ChatGPT, we can try setting a breakpoint at 0x232cb50 so that we can directly read the register and find the plain text cipher in the runtime memory space.
Get The Cipher
For convenience on macOS, System Integrity Protection (SIP) is disabled to allow lldb debugger to attach to any running binaries.
Open the QQ.app and log in to the account whose databases need decryption. Then in the terminal, we search for the PID of the main process.
1
ps aux | grep 'QQ$' | awk '{print $2}'
And it outputs a number. This number is the PID of the process, in this article, we get 37456. Open lldb, attach to the process by PID, find the image wrapper.node runtime address, and set a breakpoint at 0x232cb50 with an offset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
bash-3.2$ lldb --attach-pid 37456 (lldb) process attach --pid 37456 Process 37456 stopped * thread #1, name = 'CrBrowserMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x000000018a5aa1f4 libsystem_kernel.dylib`mach_msg2_trap + 8 libsystem_kernel.dylib`mach_msg2_trap: -> 0x18a5aa1f4 <+8>: ret
libsystem_kernel.dylib`macx_swapon: 0x18a5aa1f8 <+0>: mov x16, #-0x30 0x18a5aa1fc <+4>: svc #0x80 0x18a5aa200 <+8>: ret Target 0: (QQ) stopped. Executable module set to "/Applications/QQ.app/Contents/MacOS/QQ". Architecture set to: arm64-apple-macosx-.
Do a simple addition (expr), set up a breakpoint (br), and continue the program (c) and it will hit the breakpoint within a second.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(lldb) expr 0x00000001182e8000 + 0x232cb50 (long) $0 = 4737551184 (lldb) br s -a 4737551184 Breakpoint 1: where = wrapper.node`___lldb_unnamed_symbol392147, address = 0x000000011a614b50 (lldb) c Process 37456 resuming Process 37456 stopped * thread #37, name = 'thread_general_fixed_2', stop reason = breakpoint 1.1 frame #0: 0x000000011a614b50 wrapper.node`___lldb_unnamed_symbol392147 wrapper.node`___lldb_unnamed_symbol392147: -> 0x11a614b50 <+0>: sub sp, sp, #0x40 0x11a614b54 <+4>: stp x22, x21, [sp, #0x10] 0x11a614b58 <+8>: stp x20, x19, [sp, #0x20] 0x11a614b5c <+12>: stp x29, x30, [sp, #0x30] Target 0: (QQ) stopped.
With reference to the signature of the sqlite3_key_v2 function,
1 2 3 4 5
intsqlite3_key_v2( sqlite3 *db, /* Database to be keyed, ARM64 register x0 */ constchar *zDbName, /* Name of the database, ARM64 register x1 */ constvoid *pKey, int nKey /* The key, ARM64 register x2 and x3 */ );
And by running
1 2
(lldb) register read x3 x3 = 0x0000000000000010
we know that the length of the key is 0x10 or 16.
We need to read register x2, who stores the memory address of the key. Specify the format is character (c), the number of elements to read is 16, and the size of each element to read is 1.
Therefore, the passphrase is )`P[F5?_]7KmSg?Z (modified due to privacy) in this case.
If It Fails Integrity Check
The only difference of the QQ NT databases from the default SQLCipher configurations is that the number of iterations used with PBKDF2 key derivation is changed to 4000. We specify this by PRAGMA kdf_iter = 4000;.
It is possible that a program accidentally exit at any time, breaking part of the databases. In this case, the database cannot be successfully exported. See the example below.
Now we can enjoy reading the decrypted database freely and manage the data inside. The data structures and protocol buffers inside the database require further research, but it is much simpler.