Showing posts with label Powershell. Show all posts
Showing posts with label Powershell. Show all posts

Tuesday, March 1, 2016

Day 23 Translation what's the difference

Day 23

Translation what’s the difference

First let’s create some Connect:Direct translation tables using PowerShell:
PS C:\Users\nicke> conv ibm285 iso-8859-15 @(0..255) | Set-Content -Encoding byte ibm285-iso-8859-15.cdx

PS C:\Users\nicke> conv ibm01146 iso-8859-15 @(0..255) | Set-Content -Encoding byte ibm01146-iso-8859-15.cdx
Here we are converting from codepage ibm285 IBM EBCDIC (UK) to iso-8859-15 which has the Euro currency symbol, and converting all the byte values from 0 through to 255 (that is what the @(0..255) means), saving the result with the Connect:Direct Windows translation table file extension .cdx.
We then do the same thing for producing a translation table that will convert from IBM EBCDIC (UK-Euro) to iso-8859-15 .
Using the Powershell function below we can see the difference between these two translation tables.
# List difference between translation tables
function xlt_diff ([byte[]]$tbla,[byte[]]$tblb) {
    0..255 | %{
        if($tbla[$_] -ne $tblb[$_]) {
            "{0:x} : {1:x} | {2:x}" -f $_,$tbla[$_],$tblb[$_]
        }
    }
}
The above functions can be used as follows:
PS C:\Users\nicke> xlt_diff (cat -Encoding byte .\ibm285-iso-8859-15.cdx) (cat -Encoding byte .\ibm01146-iso-8859-15.cdx)
9f : 3f | a4
The output above shows that two translation tables differ when they map hex byte value 0x9f. In the first table it maps to hex value 0x3f, and in the other to 0xa4.
Now if we create the translation tables for translating back to either ibm285/ibm01146 from iso-8859-15, and then compare like so:
PS C:\Users\nicke> conv iso-8859-15 ibm01146 @(0..255) | Set-Content -Encoding byte iso-8859-15-ibm01146.cdx

PS C:\Users\nicke> conv iso-8859-15 ibm285 @(0..255) | Set-Content -Encoding byte iso-8859-15-ibm285.cdx

PS C:\Users\nicke> xlt_diff (cat -Encoding byte .\iso-8859-15-ibm01146.cdx) (cat -Encoding byte .\iso-8859-15-ibm285.cdx)
a4 : 9f | 6f
Here the translation tables differ in how they convert the Euro (€) symbol in iso-8859-15 (0xa4) to the two mainframe codepages.
This is not that surprising as ibm01146 has the Euro (€ 0x9f) and codepage ibm285 does not. In fact if you look up codepage 1146 on wikipedia you will see that ibm01146 was created to be ibm285 with the addition of the Euro (€) symbol.
I chose these two codepages as a simple example to showcase the finding the difference between translation tables.
These last two posts were about creating custom codepage translation tables for Connect:Direct, and spotting the differences between tables.
Next time we will look at displaying what maps to what more easily with these translation tables, and show a generally better way of translating from one codepage to another.

Thursday, February 18, 2016

Power Translation

Day 22

Power Translation

While on assignment some time ago the only scripting language I had available to me was PowerShell. The PowerShell turned out to be a very useful tool. Today I will share some of its features that helped me with Connect:Direct custom translation tables.
I needed help in understanding the translation of certain characters between the Mainframe and Windows platforms, and understanding quickly what was different between a customised translation table, and the default table on a Windows machine.
I also wanted to experiment with the translation tables without actually always being on the machine that had Connect:Direct on it.
My solution was a set of PowerShell helper functions that allowed me to examine, compare, generate and test Connect:Direct translation tables without Connect:Direct necessarily.
Here are some simple PowerShell functions to illustrate. Please be aware that I have deliberately kept these minimal for brevity.
# List all codepages
function lsenc () {
    [System.Text.Encoding]::GetEncodings()
}

# Get an object representing the codepage
function getenc ($str) {
    [System.Text.Encoding]::GetEncoding($str)
}

# Simple filter that displays bytes as hex values
function hex {
    $input | %{ write-host -NoNewline ("{0:x2} " -f $_)}
    ""
}

# Converts a byte buffer to different codepage
function conv ($from,$to,$buf,$str=$false) {
    $from_enc=getenc $from
    $to_enc=getenc $to
    if($buf.gettype().BaseType.Name -ne "Array") {
        [System.Text.Encoding]::Convert($from_enc,$to_enc,$from_enc.getbytes($buf))
    } else {
        [System.Text.Encoding]::Convert($from_enc,$to_enc,$buf)
    }
}
The above functions can be used as follows:
# Test for well known EBCDIC value
PS C:\Users\nicke> conv 1252 37 " " | hex
40 
# Test for well known ASCII value
PS C:\Users\nicke> conv 37 1252 @(0x40) | hex
20 
# hex value for £ in Windows
PS C:\Users\nicke> conv 1252 1252 "£" | hex
a3 
# hex value for £ UTF-8
PS C:\Users\nicke> conv 1252 utf-8 "£" | hex
c2 a3 
# hex value for £ in UTF-16
PS C:\Users\nicke> conv 1252 utf-16 "£" | hex
a3 00 
# hex value of £ in IBM EBCDIC (UK-Euro)
PS C:\Users\nicke> conv 1252 1146 "£" | hex
5b
You can also do the same thing with longer strings and even contents of files.
So if I have a file that is encoded using the mainframe codepage IBM-1146 like so:
PS C:\Users\nicke> (cat -Encoding Byte ibm1146.1146) | hex
c9 c2 d4 60 f1 f1 f4 f6 

PS C:\Users\nicke>
I can translate it to Windows 1252 like so:
PS C:\Users\nicke> conv 1146 1252 (cat -Encoding Byte ibm1146.1146) | Set-Content -Encoding byte ibm1146.1252

PS C:\Users\nicke> type ibm1146.1252

IBM-1146

PS C:\Users\nicke>
So as you can see the 3rd paremeter to the conv function can be a string, a byte array, or the contents of a file which is then converted to a byte array.
I wouldn’t use this for large files, but with small files just to help understand the codepages and their translation between them.
Next time we will look at actual Connect:Direct translation tables, and how to create custom translation tables easily with some Powershell functions.