Unlocking Memories: Extracting QQ NT Chat History from Encrypted Databases

Problem Formulation

Memories are precious presents for human. While most people keep their important material things in a carefully selected place, few manage to preserve their digital properties attentively.

An example could be Tencent QQ. This Chinese online real-time chat program has been popular among the community since decades ago. However, there are still no good ways to store the data people generated on the platform. Currently, to completely master the private data, exporting them from the program database is crucial.

Unfortunately, there are no official interfaces to export chat history from the program, which is ridiculous and irresponsible. If one does not strictly follow the data transfer method provided, the data may be permanently lost.

It is one’s own liberty to thoroughly control the private data, so this blog will alleviate the situation where enterprises scorn a person’s private data.

Data File Structure

QQ NT implements a brand new data structure and refactors newly constructed frameworks that utilize modern hardware “effectively”.

⚠️ This article is written on macOS Sonoma 14.4, so all the environments in this article are based on macOS. Please refer to the article with your own appropriate modifications.

Basically, the data files of a single user are located in /Users/gilbert/Library/Containers/com.tencent.qq/Data/Library/Application Support/QQ/nt_qq_{MD5} where {MD5} is a lowercase 32-character MD5 string. For example, the folder may named nt_qq_9f59fe26a5631419b727d23340f009b0.

In this folder there are three subdirectories: nt_data, nt_db, nt_temp. nt_data stores unencrypted media files, including videos, emojis, pictures, audios, etc. It can be easily backed up and integrated with a decrypted database. nt_temp just keeps some libvips thumbnails.

What we really need to pay attention to is nt_db, which stores all the db files. It has the following files.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
bash-3.2$ tree
.
├── bc_09.db
├── buddy_msg_fts.db
├── buddy_msg_fts.db-first.material
├── collection.db
├── data_line_msg_fts.db
├── discuss_msg_fts.db
├── emoji.db
├── file_assistant.db
├── files_in_chat.db
├── group_info.db
├── group_msg_fts.db
├── group_msg_fts.db-first.material
├── group_msg_fts.db-last.material
├── guild.db
├── guild.db-shm
├── guild.db-wal
├── guild1.db
├── guild1.db-shm
├── guild1.db-wal
├── guild_msg.db
├── misc.db
├── msg_fts.db
├── nt_msg.db
├── nt_msg.db-first.material
├── nt_msg.db-last.material
├── profile_info.db
├── profile_like.db
├── rdelivery.db
├── recent_contact.db
├── rich_media.db
├── settings.db
└── yffm.db

1 directory, 32 files
bash-3.2$

As it shows, the chat history is stored in db files. Different file names indicate their various usages. In fact, they share the same inner structure, but what we need to stress on first is nt_msg.db. It takes most of the disk space and holds the most important records: the user’s chat history.

Let’s inspect its head of content in hexadecimal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
00000000h: 53 51 4C 69 74 65 20 68 65 61 64 65 72 20 33 00 ; SQLite header 3.
00000010h: 04 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000020h: 51 51 5F 4E 54 20 44 42 83 00 00 00 12 80 01 61 ; QQ_NT DB.......a
00000030h: 65 30 64 32 61 37 30 32 65 35 39 61 66 33 64 37 ; e0d2a702e59af3d7
00000040h: 62 38 66 61 39 30 38 38 33 37 30 62 61 39 38 34 ; b8fa9088370ba984
00000050h: 66 31 35 32 35 39 36 31 31 32 61 38 38 34 39 39 ; f152596112a88499
00000060h: 39 63 32 64 35 66 ?? ?? ?? 39 65 30 64 66 63 36 ; 9c2d5f...9e0dfc6
00000070h: 35 37 38 64 37 61 37 65 36 64 37 32 35 30 37 35 ; 578d7a7e6d725075
00000080h: 63 33 65 63 38 36 34 63 38 39 65 62 34 63 38 64 ; c3ec864c89eb4c8d
00000090h: 33 39 34 31 32 33 37 39 62 30 35 33 32 64 35 64 ; 39412379b0532d5d
000000a0h: 36 35 30 38 32 37 64 33 35 38 62 62 64 65 32 00 ; 650827d358bbde2.
000000b0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000000c0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000000d0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000000e0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000000f0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000100h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000110h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000120h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000130h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000140h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000150h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000160h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000170h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000180h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000190h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001a0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001b0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001c0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001d0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001e0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000001f0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000200h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000210h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000220h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000230h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000240h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000250h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000260h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000270h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000280h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000290h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002a0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002b0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002c0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002d0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002e0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000002f0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000300h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000310h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000320h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000330h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000340h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000350h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000360h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000370h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000380h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000390h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003a0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003b0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003c0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003d0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003e0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
000003f0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
00000400h: 68 ?? BB 5C ?? 2B E3 ?? F8 37 ?? 14 66 ?? 44 D0 ; h.»\.+ã..7..f.DÐ
00000410h: A4 ?? 6E D6 ?? D5 36 ?? 1F 8A ?? 92 3F ?? BA 0D ; ¤.nÖ.Õ6.....?.º.
...

Some bytes are replaced with ?? to avoid possible leaks of privacy. It is apparent that real useful data start at offset 0x400h. There are 0x400h or 1024 bytes having nothing to do with the db content. So we can call it a “redundant SQLite header”. This header tells us that it is created by an internal fork of SQLite. Considering that the company just wants to waste what may actually leak more privacy (all the unencrypted media in nt_data) and makes what people really want to read and preserve tougher to handle (use a self-defined file structure and tedious encryption of db files), the file must be encrypted in some way.

But let’s just truncate the 1024-byte header beforehand by:

1
cat ./nt_msg.db | tail -c +1025 > ./nt_msg.clean.db

Disassemble And Analyze

The company will never tell you about how to decrypt the database, but people require this. Luckily, there are cool works done by some greater hackers at qq-win-db-key.

In hacker’s perspective, the program runtime will finally get the plain text passphrase and store it in the program’s memory space. So disassembly is necessary to analyze the decryption process. Good news is that QQ NT has a cross-platform Electron base, which utilizes the same (backend) code on all desktop (Windows, macOS, Linux) and mobile (Android, iOS) platforms and all supported architectures (AMD64, ARM64, LoongArch, MIPS).

Let’s take macOS & ARM64 (the latest Macintosh hardware) as an example. The software developers usually build a universal binary (single executable for both Intel and Apple Silicon) for their distributions. We just focus on its ARM64 or aarch64 binary part.

Normal developers may refer to the original guide to operate on a more friendly GUI (Hopper Disassembler). Here we use the console utility objdump. Before doing any further step, open the terminal and cd to your favorite directory. The function to decrypt the databases are labeled nt_sqlite3_key_v2, which is derived from SQLCipher’s sqlite3_key_v2. Let’s find who calls it.

1
2
cp /Applications/QQ.app/Contents/Resources/app/wrapper.node .
objdump -d wrapper.node | grep -B 20 "nt_sqlite3_key_v2"

And it shows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
232cb2c:	1b 02 00 94	bl	0x232d398
232cb30: 81 15 00 b0 adrp x1, 689 ; 0x25dd000
232cb34: 21 f0 10 91 add x1, x1, #1084 ; literal pool for: "main"
232cb38: 55 02 00 94 bl 0x232d48c
232cb3c: fd 7b 43 a9 ldp x29, x30, [sp, #48]
232cb40: f4 4f 42 a9 ldp x20, x19, [sp, #32]
232cb44: f6 57 41 a9 ldp x22, x21, [sp, #16]
232cb48: ff 03 01 91 add sp, sp, #64
232cb4c: 01 00 00 14 b 0x232cb50
232cb50: ff 03 01 d1 sub sp, sp, #64
232cb54: f6 57 01 a9 stp x22, x21, [sp, #16]
232cb58: f4 4f 02 a9 stp x20, x19, [sp, #32]
232cb5c: fd 7b 03 a9 stp x29, x30, [sp, #48]
232cb60: fd c3 00 91 add x29, sp, #48
232cb64: f3 03 03 aa mov x19, x3
232cb68: f4 03 02 aa mov x20, x2
232cb6c: f6 03 01 aa mov x22, x1
232cb70: f5 03 00 aa mov x21, x0
232cb74: e0 07 00 a9 stp x0, x1, [sp]
232cb78: 61 16 00 d0 adrp x1, 718 ; 0x25fa000
232cb7c: 21 94 39 91 add x1, x1, #3685 ; literal pool for: "nt_sqlite3_key_v2: db=%p zDb=%s"
232cb80: 06 02 00 94 bl 0x232d398
232cb84: b5 01 00 b4 cbz x21, 0x232cbb8
232cb88: 94 01 00 b4 cbz x20, 0x232cbb8
232cb8c: 73 01 00 34 cbz w19, 0x232cbb8
232cb90: e0 03 15 aa mov x0, x21
232cb94: e1 03 16 aa mov x1, x22
232cb98: 89 fe ff 97 bl 0x232c5bc
232cb9c: e1 03 00 aa mov x1, x0
232cba0: 3b 02 00 94 bl 0x232d48c
232cba4: fd 7b 43 a9 ldp x29, x30, [sp, #48]
232cba8: f4 4f 42 a9 ldp x20, x19, [sp, #32]
232cbac: f6 57 41 a9 ldp x22, x21, [sp, #16]
232cbb0: ff 03 01 91 add sp, sp, #64
232cbb4: 4a ff ff 17 b 0x232c8dc
232cbb8: 61 16 00 d0 adrp x1, 718 ; 0x25fa000
232cbbc: 21 14 3a 91 add x1, x1, #3717 ; literal pool for: "nt_sqlite3_key_v2: no key provided"

In ARM(v8) instruction, b stands for “branch unconditionally”. It transfers control to the instruction at the given address. Let’s listen to ChatGPT further explaining this piece of disassembly.

The part of the disassembly appears to be the prologue of a function. In assembly language, a function prologue typically consists of instructions that set up the function’s stack frame, save callee-saved registers, and perform other necessary setup before executing the main body of the function.

Here’s what each instruction in the prologue appears to be doing:

b 0x232cb50: This instruction branches unconditionally to the address 0x232cb50, which likely marks the beginning of the function.

sub sp, sp, #64: This instruction subtracts 64 from the stack pointer (sp), likely to reserve space on the stack for local variables or other function-specific data.

stp x22, x21, [sp, #16], stp x20, x19, [sp, #32], stp x29, x30, [sp, #48]: These instructions store the contents of specific registers onto the stack. This is often done to preserve the values of callee-saved registers across function calls.

add x29, sp, #48: This instruction sets up the frame pointer (x29) to point to the current stack frame, likely for easier access to local variables within the function.

mov instructions: These instructions move values into various registers (x19, x20, x21, x22, etc.). These could be setting up parameters for the function or initializing local variables.

stp x0, x1, [sp]: This instruction stores the contents of registers x0 and x1 onto the stack, possibly as part of function parameter passing or saving values for later use.

Overall, this part of the disassembly is setting up the function’s environment, including allocating space on the stack, saving necessary registers, and initializing variables, in preparation for the execution of the function’s main body.

From the explanation from ChatGPT, we can try setting a breakpoint at 0x232cb50 so that we can directly read the register and find the plain text cipher in the runtime memory space.

Get The Cipher

For convenience on macOS, System Integrity Protection (SIP) is disabled to allow lldb debugger to attach to any running binaries.

Open the QQ.app and log in to the account whose databases need decryption. Then in the terminal, we search for the PID of the main process.

1
ps aux | grep 'QQ$' | awk '{print $2}'    

And it outputs a number. This number is the PID of the process, in this article, we get 37456. Open lldb, attach to the process by PID, find the image wrapper.node runtime address, and set a breakpoint at 0x232cb50 with an offset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bash-3.2$ lldb --attach-pid 37456
(lldb) process attach --pid 37456
Process 37456 stopped
* thread #1, name = 'CrBrowserMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x000000018a5aa1f4 libsystem_kernel.dylib`mach_msg2_trap + 8
libsystem_kernel.dylib`mach_msg2_trap:
-> 0x18a5aa1f4 <+8>: ret

libsystem_kernel.dylib`macx_swapon:
0x18a5aa1f8 <+0>: mov x16, #-0x30
0x18a5aa1fc <+4>: svc #0x80
0x18a5aa200 <+8>: ret
Target 0: (QQ) stopped.
Executable module set to "/Applications/QQ.app/Contents/MacOS/QQ".
Architecture set to: arm64-apple-macosx-.

Find the address loaded wrapper.node.

1
2
(lldb) image list -o -f | grep /Applications/QQ.app/Contents/Resources/app/wrapper.node
[ 0] 0x00000001182e8000 /Applications/QQ.app/Contents/Resources/app/wrapper.node

Do a simple addition (expr), set up a breakpoint (br), and continue the program (c) and it will hit the breakpoint within a second.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(lldb) expr 0x00000001182e8000 + 0x232cb50
(long) $0 = 4737551184
(lldb) br s -a 4737551184
Breakpoint 1: where = wrapper.node`___lldb_unnamed_symbol392147, address = 0x000000011a614b50
(lldb) c
Process 37456 resuming
Process 37456 stopped
* thread #37, name = 'thread_general_fixed_2', stop reason = breakpoint 1.1
frame #0: 0x000000011a614b50 wrapper.node`___lldb_unnamed_symbol392147
wrapper.node`___lldb_unnamed_symbol392147:
-> 0x11a614b50 <+0>: sub sp, sp, #0x40
0x11a614b54 <+4>: stp x22, x21, [sp, #0x10]
0x11a614b58 <+8>: stp x20, x19, [sp, #0x20]
0x11a614b5c <+12>: stp x29, x30, [sp, #0x30]
Target 0: (QQ) stopped.

With reference to the signature of the sqlite3_key_v2 function,

1
2
3
4
5
int sqlite3_key_v2(
sqlite3 *db, /* Database to be keyed, ARM64 register x0 */
const char *zDbName, /* Name of the database, ARM64 register x1 */
const void *pKey, int nKey /* The key, ARM64 register x2 and x3 */
);

And by running

1
2
(lldb) register read x3
x3 = 0x0000000000000010

we know that the length of the key is 0x10 or 16.

We need to read register x2, who stores the memory address of the key. Specify the format is character (c), the number of elements to read is 16, and the size of each element to read is 1.

1
2
3
4
5
6
7
8
(lldb) register read x2
x2 = 0x0000010805f442c0
(lldb) memory read --format c --count 16 --size 1 0x0000010805f442c0
0x10805f442c0: )`P[F5?_]7KmSg?Z
(lldb) detach
Process 37456 detached
(lldb) exit
bash-3.2$

Therefore, the passphrase is )`P[F5?_]7KmSg?Z (modified due to privacy) in this case.

If It Fails Integrity Check

The only difference of the QQ NT databases from the default SQLCipher configurations is that the number of iterations used with PBKDF2 key derivation is changed to 4000. We specify this by PRAGMA kdf_iter = 4000;.

It is possible that a program accidentally exit at any time, breaking part of the databases. In this case, the database cannot be successfully exported. See the example below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ sqlcipher ./nt_msg.clean.db
SQLite version 3.44.2 2023-11-24 11:41:44 (SQLCipher 4.5.6 community)
Enter ".help" for usage hints.
sqlite> PRAGMA key = ')`P[F5?_]7KmSg?Z';
ok
sqlite> PRAGMA kdf_iter = 4000;
sqlite> .tables
c2c_msg_flow_table msg_unread_info_table
c2c_msg_table nt_bookmark_table
c2c_temp_msg_flow_table nt_kv_storage_table
c2c_temp_msg_table nt_uid_mapping_table
dataline_flow_table pai_yi_pai_msg_id_table
dataline_msg_table recent_contact_delete_storage
discuss_msg_flow_table recent_contact_table
discuss_msg_table recent_contact_top_table
draft_storage_table_v1 recent_contact_v3_table
game_msg_config_table search_history
group_msg_flow_table service_assistant_contact
group_msg_table service_assistant_msg_table
hidden_session_storage_table_v1 them_module_storage_table_v1
msg_backup_storage_table
sqlite> ATTACH DATABASE 'plaintext.db' AS plaintext KEY '';
sqlite> SELECT sqlcipher_export('plaintext');
Runtime error: database disk image is malformed
sqlite> PRAGMA integrity_check;
*** in database main ***
Tree 39 page 259831 cell 137: 2nd reference to page 262145
Page 262144: never used
Runtime error: database disk image is malformed (11)
sqlite>

By running integrity_check, we know that the database is not intact and cannot pass the sqlcipher_export process. We need to repair the database.

Referring to this article, we can try to repair the database by running

1
2
3
4
sqlite> .mode insert 
sqlite> .output ./dump_all.sql
sqlite> .dump
sqlite> .exit

This exports the database in pure SQL statements. We regenerate the db file by

1
2
cat ./dump_all.sql | grep -v TRANSACTION | grep -v ROLLBACK | grep -v COMMIT > ./dump_all_notrans.sql
sqlite3 ./repair.db ".read ./dump_all_notrans.sql"

The repair.db is the final output.

Next Steps

Now we can enjoy reading the decrypted database freely and manage the data inside. The data structures and protocol buffers inside the database require further research, but it is much simpler.


Unlocking Memories: Extracting QQ NT Chat History from Encrypted Databases
https://lucisurbe.pages.dev/2024/03/20/Unlocking-Memories-Extracting-QQ-NT-Chat-History-from-Encrypted-Databases/
Author
Lucis Urbe
Posted on
March 20, 2024
Licensed under