PHP requires to implement a simple function to escape unicode. Some severs send uncode encoded string for supporting multi byte characters. Its encoded result will be something like \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774
function uncode_escape() can be implemented as following:
function unicode_escape($str, $encoding=null) { if (is_null($encoding)) $encoding = ini_get('mbstring.internal_encoding'); return preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/u', create_function('$match', 'return mb_convert_encoding(pack("H*", $match[1]), '.var_export($encoding, true).', "UTF-16BE");'), $str); }
Below is an example to check the source string by decoding possible options:
#!/usr/bin/php <?php /** * Test Code * * Author: Chun Kang * Date: 2021.11.02 **/ echo "Please type what you want to decode:\n"; $src = readline(); if (!strlen($src)) { $src = "\\uBBF8\\uC158 \\uC774\\uC2A4\\uD0C4\\uBD88 4: \\uC775\\uC2A4\\uD2B8\\uB9BC \\uB370\\uC774"; } echo "\n#Source: {$src}\n\n"; echo "#Investigation Result\n"; $resp = urldecode($src); echo "case 1) urldecode: {$resp}\n"; $resp = rawurldecode($src); echo "case 2) rawurldecode: {$resp}\n"; $resp = utf8_decode($src); echo "case 3) utf8_decode: {$resp}\n"; function unicode_escape($str, $encoding=null) { if (is_null($encoding)) $encoding = ini_get('mbstring.internal_encoding'); return preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/u', create_function('$match', 'return mb_convert_encoding(pack("H*", $match[1]), '.var_export($encoding, true).', "UTF-16BE");'), $str); } $resp = unicode_escape( $src); echo "case 4) unicode_escape: {$resp}\n"; $resp = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) { return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE'); }, $str); echo "case 5) mb_convert_encoding: {$resp}\n"; echo "\n\n";
Its result will be like below:
Please type what you want to decode: \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774 #Source: \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774 #Investigation Result case 1) urldecode: \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774 case 2) rawurldecode: \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774 case 3) utf8_decode: \uBBF8\uC158 \uC774\uC2A4\uD0C4\uBD88 4: \uC775\uC2A4\uD2B8\uB9BC \uB370\uC774 case 4) unicode_escape: 미션 이스탄불 4: 익스트림 데이 case 5) mb_convert_encoding: