Thursday, February 18, 2016

Power Translation

Day 22

Power Translation

While on assignment some time ago the only scripting language I had available to me was PowerShell. The PowerShell turned out to be a very useful tool. Today I will share some of its features that helped me with Connect:Direct custom translation tables.
I needed help in understanding the translation of certain characters between the Mainframe and Windows platforms, and understanding quickly what was different between a customised translation table, and the default table on a Windows machine.
I also wanted to experiment with the translation tables without actually always being on the machine that had Connect:Direct on it.
My solution was a set of PowerShell helper functions that allowed me to examine, compare, generate and test Connect:Direct translation tables without Connect:Direct necessarily.
Here are some simple PowerShell functions to illustrate. Please be aware that I have deliberately kept these minimal for brevity.
# List all codepages
function lsenc () {
    [System.Text.Encoding]::GetEncodings()
}

# Get an object representing the codepage
function getenc ($str) {
    [System.Text.Encoding]::GetEncoding($str)
}

# Simple filter that displays bytes as hex values
function hex {
    $input | %{ write-host -NoNewline ("{0:x2} " -f $_)}
    ""
}

# Converts a byte buffer to different codepage
function conv ($from,$to,$buf,$str=$false) {
    $from_enc=getenc $from
    $to_enc=getenc $to
    if($buf.gettype().BaseType.Name -ne "Array") {
        [System.Text.Encoding]::Convert($from_enc,$to_enc,$from_enc.getbytes($buf))
    } else {
        [System.Text.Encoding]::Convert($from_enc,$to_enc,$buf)
    }
}
The above functions can be used as follows:
# Test for well known EBCDIC value
PS C:\Users\nicke> conv 1252 37 " " | hex
40 
# Test for well known ASCII value
PS C:\Users\nicke> conv 37 1252 @(0x40) | hex
20 
# hex value for £ in Windows
PS C:\Users\nicke> conv 1252 1252 "£" | hex
a3 
# hex value for £ UTF-8
PS C:\Users\nicke> conv 1252 utf-8 "£" | hex
c2 a3 
# hex value for £ in UTF-16
PS C:\Users\nicke> conv 1252 utf-16 "£" | hex
a3 00 
# hex value of £ in IBM EBCDIC (UK-Euro)
PS C:\Users\nicke> conv 1252 1146 "£" | hex
5b
You can also do the same thing with longer strings and even contents of files.
So if I have a file that is encoded using the mainframe codepage IBM-1146 like so:
PS C:\Users\nicke> (cat -Encoding Byte ibm1146.1146) | hex
c9 c2 d4 60 f1 f1 f4 f6 

PS C:\Users\nicke>
I can translate it to Windows 1252 like so:
PS C:\Users\nicke> conv 1146 1252 (cat -Encoding Byte ibm1146.1146) | Set-Content -Encoding byte ibm1146.1252

PS C:\Users\nicke> type ibm1146.1252

IBM-1146

PS C:\Users\nicke>
So as you can see the 3rd paremeter to the conv function can be a string, a byte array, or the contents of a file which is then converted to a byte array.
I wouldn’t use this for large files, but with small files just to help understand the codepages and their translation between them.
Next time we will look at actual Connect:Direct translation tables, and how to create custom translation tables easily with some Powershell functions.

No comments: