Showing posts with label xlate. Show all posts
Showing posts with label xlate. Show all posts

Tuesday, March 1, 2016

Day 23 Translation what's the difference

Day 23

Translation what’s the difference

First let’s create some Connect:Direct translation tables using PowerShell:
PS C:\Users\nicke> conv ibm285 iso-8859-15 @(0..255) | Set-Content -Encoding byte ibm285-iso-8859-15.cdx

PS C:\Users\nicke> conv ibm01146 iso-8859-15 @(0..255) | Set-Content -Encoding byte ibm01146-iso-8859-15.cdx
Here we are converting from codepage ibm285 IBM EBCDIC (UK) to iso-8859-15 which has the Euro currency symbol, and converting all the byte values from 0 through to 255 (that is what the @(0..255) means), saving the result with the Connect:Direct Windows translation table file extension .cdx.
We then do the same thing for producing a translation table that will convert from IBM EBCDIC (UK-Euro) to iso-8859-15 .
Using the Powershell function below we can see the difference between these two translation tables.
# List difference between translation tables
function xlt_diff ([byte[]]$tbla,[byte[]]$tblb) {
    0..255 | %{
        if($tbla[$_] -ne $tblb[$_]) {
            "{0:x} : {1:x} | {2:x}" -f $_,$tbla[$_],$tblb[$_]
        }
    }
}
The above functions can be used as follows:
PS C:\Users\nicke> xlt_diff (cat -Encoding byte .\ibm285-iso-8859-15.cdx) (cat -Encoding byte .\ibm01146-iso-8859-15.cdx)
9f : 3f | a4
The output above shows that two translation tables differ when they map hex byte value 0x9f. In the first table it maps to hex value 0x3f, and in the other to 0xa4.
Now if we create the translation tables for translating back to either ibm285/ibm01146 from iso-8859-15, and then compare like so:
PS C:\Users\nicke> conv iso-8859-15 ibm01146 @(0..255) | Set-Content -Encoding byte iso-8859-15-ibm01146.cdx

PS C:\Users\nicke> conv iso-8859-15 ibm285 @(0..255) | Set-Content -Encoding byte iso-8859-15-ibm285.cdx

PS C:\Users\nicke> xlt_diff (cat -Encoding byte .\iso-8859-15-ibm01146.cdx) (cat -Encoding byte .\iso-8859-15-ibm285.cdx)
a4 : 9f | 6f
Here the translation tables differ in how they convert the Euro (€) symbol in iso-8859-15 (0xa4) to the two mainframe codepages.
This is not that surprising as ibm01146 has the Euro (€ 0x9f) and codepage ibm285 does not. In fact if you look up codepage 1146 on wikipedia you will see that ibm01146 was created to be ibm285 with the addition of the Euro (€) symbol.
I chose these two codepages as a simple example to showcase the finding the difference between translation tables.
These last two posts were about creating custom codepage translation tables for Connect:Direct, and spotting the differences between tables.
Next time we will look at displaying what maps to what more easily with these translation tables, and show a generally better way of translating from one codepage to another.

Sunday, December 9, 2012

Transformers



Day 17

Connect:Direct is available on both ASCII and EBCDIC character set machines. There are also many different code pages available to cater to different regions and languages. So it naturally comes about that translations of one character set to another will be needed from time to time.

In Connect:Direct this is achieved by default for translations between ASCII and EBCDIC in either direction for DATATYPE=TEXT files. For custom requirements, tradtionally this was achieved using translation tables and referring to them within the SYSOPTS clause of a Connect:Direct process.

Translation tables come in two flavours Single Byte Character Set (SBCS) and Double Byte Character Set (DBCS).

For SBCS translation tables, it has in the past been necessary for me to decode a "custom_translation.xlt" as the source to build it was not available. To do this on UNIX I wrote a short shell function to take a binary .xlt file and produce the source to build the SBCS translation table.

function dxlt
{
 if [[ $# -ne 1 ]]
 then
  echo
  echo "Usage: dxlt file.xlt"
  echo
  echo "Dumps the C:D transaltion table file.xlt."
  echo
  return
 fi
 echo
 echo " 0 1 2 3 4 5 6 7 8 9 a b c d e f\n"
 od -A x -t x1 $1 | cut -c6- | sed 's/^00$//'
}

Below is an example of using the above shell function:

$ dxlt custom_translation.xlt

 0 1 2 3 4 5 6 7 8 9 a b c d e f

00 00 01 02 03 37 2d 2e 2f 16 05 25 0b 0c 0d 0e 0f
10 10 11 12 13 3c 3d 32 26 18 19 3f 27 1c 1d 1e 1f
20 40 5a 7f 7b 5b 6c 50 7d 4d 5d 5c 4e 6b 60 4b 61
30 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 7a 5e 4c 7e 6e 6f
40 7c c1 c2 c3 c4 c5 c6 c7 c8 c9 d1 d2 d3 d4 d5 d6
50 d7 d8 d9 e2 e3 e4 e5 e6 e7 e8 e9 ad e0 bd 5f 6d
60 79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96
70 97 98 99 a2 a3 a4 a5 a6 a7 a8 a9 c0 4f d0 bc 07
80 20 21 22 23 24 15 06 17 28 29 2a 2b 2c 09 0a 1b
90 30 31 1a 33 34 35 36 08 38 39 3a 3b 04 14 3e e1
a0 41 aa 43 44 45 46 47 48 49 51 52 53 54 55 56 57
b0 58 59 62 63 64 65 66 67 68 69 70 71 72 73 74 ab
c0 76 77 78 80 8a 8b 8c 8d 8e 8f 90 9a 9b 9c 9d 9e
d0 9f a0 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7
e0 b8 b9 ba bb bc bd be bf ca cb cc cd ce cf da db
f0 dc dd de df ea eb ec ed ee ef fa fb fc fd fe ff

Another useful function is called "chars" which just shows you the printable characters within the current locale.

function chars
{
	echo
        echo "     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f"
	nawk 'BEGIN{for(i=0;i<=255;i++){printf "%c",i}}' | od -A x -t c $1 | cut -c6- | \
	sed 's/^00$//;s/[0-9][0-9][0-9]/   /g;s/   /  /g'
}
An example of using the above function is:
$ chars

   0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f

00 \0                   \a \b \t \n \v \f \r
10
20    !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /
30 0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?
40 @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O
50 P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _
60 `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o
70 p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~
80
90
a0    ¡  ¢  £  ¤  ¥  ¦  §  ¨  ©  ª  «  ¬  ®  ¯
b0 °  ±  ²  ³  ´  µ  ¶  ·  ¸  ¹  º  »  ¼  ½  ¾  ¿
c0 À  Á  Â  Ã  Ä  Å  Æ  Ç  È  É  Ê  Ë  Ì  Í  Î  Ï
d0 Ð  Ñ  Ò  Ó  Ô  Õ  Ö  ×  Ø  Ù  Ú  Û  Ü  Ý  Þ  ß
e0 à  á  â  ã  ä  å  æ  ç  è  é  ê  ë  ì  í  î  ï
f0 ð  ñ  ò  ó  ô  õ  ö  ÷  ø  ù  ú  û  ü  ý  þ  ÿ

Together these two functions are very useful for sorting out translation table problems where a UNIX machine is involved.
As with many problems it is important to understand the context surrounding the issue at hand.
In terms of codepage translation tables this means looking at what type of file is being translated, which codepage was used to produce the file in question, which translation table was used to transform it, and what codepage is being used to view/process it at the destination. If these are not taken into account it can make solving translation tables issues very difficult to solve.
In the next post I will walk through a particular codepage translation problem using the above functions.