The problem of Chinese character encoding used by php to export csv to excel 97-2003

Phanix
·
·
IPFS
·

It is not too difficult to export data, but for old versions of excel like excel 97-2003, the encoding of Chinese characters is a hassle

The relatively new excel has already begun to support Chinese characters using common unicode encoding methods such as utf-8, but in the early days, various languages (Chinese, Japanese, many European languages) have their own encoding methods, and then the same A code may be used in different languages, and it's a mess.

Before utf-8 unified the rivers and lakes, ucs coding also dominated for a while ( the difference between utf8 and ucs ), but ucs-2 fixed 2 bytes, and it could not cover all languages, while ucs-4 was fixed. 4 bytes, in the era around 2000, was really a waste of bandwidth.

So when utf-8 uses 1~4 bytes, and there is no byte order dependency (little-endian / big-endian) encoding method, it will soon dominate the world after entering this mess. Now (2019), the global web page More than 90% of them already use utf-8.

The ancient version of excel 97-2003 used ucs-2 LE BOM (little-endian, with byte order mark) for Chinese characters, so if you directly write csv data to the file, then use excel 97-2003 to open, All Chinese characters will be garbled.

The solution is to use mb_convert_encoding (for mb_string library) to convert the data into ucs-2LE encoding, and then add the BOM of the file header.

 file_put_contents($filename, chr(0xFF) . chr(0xFE) . mb_convert_encoding($strdata, "UCS-2LE", "UTF-8"));

If the BOM is changed to chr(0xFE) . chr(0xFF) it becomes big-endian.

Original link: Phanix's Blog

CC BY-NC-ND 2.0

Like my work? Don't forget to support and clap, let me know that you are with me on the road of creation. Keep this enthusiasm together!