Hello everyone!
This is a question about text file encoding and code pages
I need to deal with a lot of text files, and their encoding is not uniform, there may be, ANSI, UTF8, UTF-16, GB2312...
The encoding in the example is GB2312 and the code page is 936
Under Powershell, I need to specify the encoding when reading the file, otherwise the read text will be garbled, So in the Powershell code, added code that recognizes the text encoding
Suppose, after reading the file, I need to do a replacement operation, replace "测试" to "正式"
Finally, I need to save it in the original encoding format
Under QM, to perform the above operation, the text must be converted to UTF8, otherwise the replacement operation cannot be completed
But I can't do coding and code page-related programming
Here's the code for powershell, How to implement similar text file encoding and code page recognition under QM?
Thanks in advance for any advice and help
david
This is a question about text file encoding and code pages
I need to deal with a lot of text files, and their encoding is not uniform, there may be, ANSI, UTF8, UTF-16, GB2312...
The encoding in the example is GB2312 and the code page is 936
Under Powershell, I need to specify the encoding when reading the file, otherwise the read text will be garbled, So in the Powershell code, added code that recognizes the text encoding
Suppose, after reading the file, I need to do a replacement operation, replace "测试" to "正式"
Finally, I need to save it in the original encoding format
Under QM, to perform the above operation, the text must be converted to UTF8, otherwise the replacement operation cannot be completed
But I can't do coding and code page-related programming
Here's the code for powershell, How to implement similar text file encoding and code page recognition under QM?
Thanks in advance for any advice and help
david
$codes = @'
public static class GuessCoder
{
public static string Detect(string file)
{
byte[] data=System.IO.File.ReadAllBytes(file);
if (data.Length > 2 && data[0] == 0xFF && data[1] == 0xFE){return "Unicode";}
if (data.Length > 2 && data[0] == 0xFE && data[1] == 0xFF){return "UTF-16BE";}
if (data.Length > 3 && data[0] == 0xEF && data[1] == 0xBB && data[2] == 0xBF){
return "UTF-8";
}else{
int charByteCounter = 1;
byte curByte;
for (int i = 0; i < data.Length; i++)
{
curByte = data[i];
if (charByteCounter == 1)
{
if (curByte >= 0x80)
{
while (((curByte <<= 1) & 0x80) != 0)
{
charByteCounter++;
}
if (charByteCounter == 1 || charByteCounter > 6)
{
return "GB2312";
}
}
}
else
{
if ((curByte & 0xC0) != 0x80)
{
return "GB2312";
}
charByteCounter--;
}
}
if (charByteCounter > 1)
{
return "GB2312";
}
return "UTF-8";
}
}
}
'@;
Add-Type -TypeDefinition $codes
$file_in = "$HOME\Desktop\Test.txt"
$file_ok = "$HOME\Desktop\Test_ok.txt"
$checkenc = [GuessCoder]::Detect($file_in)
$checkenc
$enc = [Text.Encoding]::GetEncoding($checkenc)
$enc
$text = [IO.File]::ReadAllText($file_in, $enc)
$text = $text -replace '测试','正式'
[IO.File]::WriteAllText($file_ok, $text, $enc)