Sunday, December 9, 2012

Transformers



Day 17

Connect:Direct is available on both ASCII and EBCDIC character set machines. There are also many different code pages available to cater to different regions and languages. So it naturally comes about that translations of one character set to another will be needed from time to time.

In Connect:Direct this is achieved by default for translations between ASCII and EBCDIC in either direction for DATATYPE=TEXT files. For custom requirements, tradtionally this was achieved using translation tables and referring to them within the SYSOPTS clause of a Connect:Direct process.

Translation tables come in two flavours Single Byte Character Set (SBCS) and Double Byte Character Set (DBCS).

For SBCS translation tables, it has in the past been necessary for me to decode a "custom_translation.xlt" as the source to build it was not available. To do this on UNIX I wrote a short shell function to take a binary .xlt file and produce the source to build the SBCS translation table.

function dxlt
{
 if [[ $# -ne 1 ]]
 then
  echo
  echo "Usage: dxlt file.xlt"
  echo
  echo "Dumps the C:D transaltion table file.xlt."
  echo
  return
 fi
 echo
 echo " 0 1 2 3 4 5 6 7 8 9 a b c d e f\n"
 od -A x -t x1 $1 | cut -c6- | sed 's/^00$//'
}

Below is an example of using the above shell function:

$ dxlt custom_translation.xlt

 0 1 2 3 4 5 6 7 8 9 a b c d e f

00 00 01 02 03 37 2d 2e 2f 16 05 25 0b 0c 0d 0e 0f
10 10 11 12 13 3c 3d 32 26 18 19 3f 27 1c 1d 1e 1f
20 40 5a 7f 7b 5b 6c 50 7d 4d 5d 5c 4e 6b 60 4b 61
30 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 7a 5e 4c 7e 6e 6f
40 7c c1 c2 c3 c4 c5 c6 c7 c8 c9 d1 d2 d3 d4 d5 d6
50 d7 d8 d9 e2 e3 e4 e5 e6 e7 e8 e9 ad e0 bd 5f 6d
60 79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96
70 97 98 99 a2 a3 a4 a5 a6 a7 a8 a9 c0 4f d0 bc 07
80 20 21 22 23 24 15 06 17 28 29 2a 2b 2c 09 0a 1b
90 30 31 1a 33 34 35 36 08 38 39 3a 3b 04 14 3e e1
a0 41 aa 43 44 45 46 47 48 49 51 52 53 54 55 56 57
b0 58 59 62 63 64 65 66 67 68 69 70 71 72 73 74 ab
c0 76 77 78 80 8a 8b 8c 8d 8e 8f 90 9a 9b 9c 9d 9e
d0 9f a0 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7
e0 b8 b9 ba bb bc bd be bf ca cb cc cd ce cf da db
f0 dc dd de df ea eb ec ed ee ef fa fb fc fd fe ff

Another useful function is called "chars" which just shows you the printable characters within the current locale.

function chars
{
	echo
        echo "     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f"
	nawk 'BEGIN{for(i=0;i<=255;i++){printf "%c",i}}' | od -A x -t c $1 | cut -c6- | \
	sed 's/^00$//;s/[0-9][0-9][0-9]/   /g;s/   /  /g'
}
An example of using the above function is:
$ chars

   0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f

00 \0                   \a \b \t \n \v \f \r
10
20    !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /
30 0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?
40 @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O
50 P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _
60 `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o
70 p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~
80
90
a0    ¡  ¢  £  ¤  ¥  ¦  §  ¨  ©  ª  «  ¬  ®  ¯
b0 °  ±  ²  ³  ´  µ  ¶  ·  ¸  ¹  º  »  ¼  ½  ¾  ¿
c0 À  Á  Â  Ã  Ä  Å  Æ  Ç  È  É  Ê  Ë  Ì  Í  Î  Ï
d0 Ð  Ñ  Ò  Ó  Ô  Õ  Ö  ×  Ø  Ù  Ú  Û  Ü  Ý  Þ  ß
e0 à  á  â  ã  ä  å  æ  ç  è  é  ê  ë  ì  í  î  ï
f0 ð  ñ  ò  ó  ô  õ  ö  ÷  ø  ù  ú  û  ü  ý  þ  ÿ

Together these two functions are very useful for sorting out translation table problems where a UNIX machine is involved.
As with many problems it is important to understand the context surrounding the issue at hand.
In terms of codepage translation tables this means looking at what type of file is being translated, which codepage was used to produce the file in question, which translation table was used to transform it, and what codepage is being used to view/process it at the destination. If these are not taken into account it can make solving translation tables issues very difficult to solve.
In the next post I will walk through a particular codepage translation problem using the above functions.

No comments: