Sunday, April 25, 2010

FileAgent: Rules of the game

Day 9

On Windows you can configure FileAgent by selecting the shortcut on the desktop or using the Start menu "Configure Connect Direct FileAgent".  You should see something like the following:



The FileAgent comes with a default configuration file which is what it is showing above.  You select the configuration you want to work with in the left pane of the application.  In the right pane you have two tabs.  The tab that is showing above is the "FileAgent" tab where we configure it to connect via the API with the Connect:Direct.

We also list the directories we want to FileAgent to watch.  There should be no reason to change the default values that you see above that were already in the configuration, unless they clash with some other software already using those values.  

You can have multiple instances of FileAgent running which would need you to have separate "Gate Keeper" ports for each.  In most cases that would not be necessary. 

By default when FileAgent is running it will use this "Default_Config.ser" configuration file unless overridden by options on the command line if running FileAgent from the command prompt, or by editing the appropriate .lax file to override the arguments that get passed to FileAgent.  If running as a Windows service then that would be the "cdfa$.lax", or the "cdfa.lax" if runing from the command prompt or shortcut on Windows or on UNIX.

If you just use the "Default_Config.ser" then you will not need to override anything.



On the "Rules" tab you can see a "Submit Process Rules" tab within it.  This is where we define our rules to automatically submit C:D processes for the new file detected in the directory that was specified to watch.  If there are any problems with submitting a process from FileAgent the first things to check will be the "Enabled" check-box on this tab and also within the definition of the matching file pattern as can be seen on the next screenshot.


In this rule I have defined a match for all the .dat files within the watch directory.  On detecting a new file that matches this pattern FileAgent will submit the C:D process specified in the "Process Name" field with the arguments to be passed to the C:D process on in the next field.

In my C:D process I have some variables that take their values from the "Process arguments" field within this FileAgent rule.  The C:D process variables are &NODE which is assigned to CD.REMOTE, &FILESPEC and &NAME .  They take their values from some built in FileAgent variables "%FA_FILE_FOUND." and "%FA_NOT_PATH." respectively.  It is important to note that these special FileAgent variable names start with a "%" and end with a "." . Miss off the period at the end of the name and they will not work.



Above you can see the C:D process that is used by the FileAgent rule.  Previously when we sent a test file using the C:D Requester we used the "Send/Receive File" function in the left hand pane of the Requester which presented you with a form which we filled out.  Behind the scenes that form constructed a textual C:D process like the one you see above.

I will go into C:D processes in more detail at a later date.  For now hopefully you see some sort of script with a COPY command where the source of the copy is specified by &FILESPEC and the destination by &NAME .  Also note that the destination node or secondary node (SNODE) is specified by the C:D process variable &NODE .

This means that potentially we could use this single C:D process template for all our FileAgent rules.

If the transfer is successful then the source file is removed as it has been safely received at the destination.

Next post I will explain how to run the FileAgent and test that this rule is working.

Thursday, April 15, 2010

FileAgent: Keeping a watchful eye


Day 8

The Connect:Direct FileAgent is a Java application that runs on Windows, Linux & UNIX.  It comes with Connect:Direct and is free.

The FileAgent is useful as you can give it many directories to watch and define some rules to decide whether or not it should perform some action on any new files that appear in the directories you gave it.

The action to be performed is whatever can be put in a Connect:Direct process. So that process could forward a file onward to another C:D node, or run a local program to process the file.

This is useful as it reduces the amount of scripting you might otherwise have to do to provide the same effect.  For instance you may have an application that produces a file in a certain directory which is to be sent to another C:D node.  This might be some sort of database extract that has to be sent to an ETL (Extract Transform & Load) platform before being loaded into a data warehouse.

This could be done in a few different ways: 

You could get the application people to add a call to their extract script to the Connect:Direct command line to initiate a transfer immediately after the extract is complete.  If the application team are fine doing that then this would be the most efficient way to do this.  The application team might to be uncomfortable doing this as they might see this as the C:D team's responsibility.  They might prefer just for C:D to pick up the file when it appears in the said directory.

You could write a script that watches for new files in a directory and transfers them onward to the ETL C:D node.  This would make things easy for the application team as they only have to tell you the directory the file will appear in and the file name to look for and where to forward the file to.  It can be tricky to get the script to cover all the cases that the script may encounter.  If you have a mixed environment including Windows and UNIX you would have to solve the problem twice as the platforms are scripted differently.

You could just add the directory and a simple rule to FileAgent using a template C:D process.  As the FileAgent is a Java application it looks and behaves very similarly on Windows & UNIX with only a couple of exceptions.

In the next post I will look at how FileAgent is configured and used in practice.

Sunday, April 11, 2010

The other way around





Day 7

For my brother Lee he only needed to use his connection in one direction, from his Windows node to the remote UNIX node.
If he had needed to do it the other way around this is what I would have told him.

The concepts are the same, just presented differently as I said previously. We already have the netmap configuration, but we will need to make a configuration to allow the remote UNIX machine to send files to our local node.

The files that are received by our local node from the CD.REMOTE will have to be owned by a user account on our local node.  We could just use any local user account but it would be better to create a user specifically for this purpose and lock down the directories as mentioned previously when we set up the UNIX side of the configuration.

First let's set up Connect:Direct to allow a local user appusr1 to work with Connect:Direct.  Double-click the "User Authorities" in the left pane of the Connect:Direct Requester.  As you can see in the image below there are some entries that start with an asterisk which indicates they are templates.  In our case we are defining appusr1 as a non-administrative user of Connect:Direct.




Click the "New Genuser" and fill in as below. The controls will inherit from the templates you saw earlier and of course you can override those values here if you desire.



So how do we associate a incoming remote transfer with the local user we want to own the files?  This is done using the proxy definition.  Double-click "Proxies" in the left hand pane of the Requester and enter the fields as below.


So we are saying here that the remote user appusr1 from the remote node CD.REMOTE will be mapped to the local user appusr1.  It doesn't have to be the same name as I have done here.  Normally I create user accounts that reflect the name of the application that will consume the file if a suitable user account doesn't already exist.

If we didn't define a proxy for the remote Connect:Direct user, the remote Connect:Direct process would have to supply a SNODEID containing a local user account name on our local node together with a password.

This is not good for several reasons.  It is not a good idea to advertise local account information to anyone outside of your organisation.  Also hard coding passwords in scripts of any kind is also not a good idea because they might be seen, and if you have an information security department they will want you to change your passwords frequently which would mean amending all your scripts that contain passwords frequently.

The nice thing about Connect:Direct proxies is that the remote node doesn't care about what local user account you might be using, and they do not have to supply a password.  If their user is authorised to use Connect:Direct on the remote node they do not need to do anything regarding the user being used as you have covered that in the proxy definition.

For my brother's connection they were not using Secure+ to encrypt the data in transit or use it to authenticate using digital certificates.  If you read my post "Secure IT" then you will know that I always use Secure+.  In a later post I will give an example of doing just that.


Sunday, April 4, 2010

Allowing Connections

Day 6

Earlier I explained how to make a connection and send a file to another Connect:Direct node, in this case a 3rd party. I didn't explain the steps that the remote node administrator makes in order to allow your connection to him.

First I will explain how this was done to allow in this case a remote connection to a UNIX Connect:Direct node. In the next post I will explain how this is done in the opposite direction i.e. from the UNIX machine to the local Windows Connect:Direct node.

Most of the configuration of Connect:Direct on a UNIX machine is done by editing files. The files we will be talking about today are the netmap.cfg & userfile.cfg files. Conceptually the information in the Windows Connect:Direct configuration is very similar to that stored in the UNIX Connect:Direct files. It is just the presentation of the information that is different. This is a common theme when dealing with Connect:Direct on different platforms.

Here is the netmap entry for the Windows Connect:Direct node NICKE in the UNIX Connect:Direct configuartion file netmap.cfg :
NICKE:\
:comm.info=192.168.10.10;1364:\
:comm.transport=tcp:\
:contact.name=:\
:contact.phone=:\
:descrip=:


The netmap is used to associate a network address with a Connect:Direct node name. So here you can see that the node name NICKE is associated with the ip-address 192.168.10.10 and the port 1364.  The lines in bold are the important ones; the other lines are descriptive.

The format of the configuration entry is important. You can see that field names are followed by an "=" sign and then the value for that field.
Each of these field/value pairs are surrounded by colons, and that each line ends with "\" except the last line of the configuration entry.

I talk about configuration entry, as the same format is used by other Connect:Direct UNIX configuration files such as the userfile.cfg which I'll talk about next.

*@NICKE:\
:local.id=appusr1:\

:pstmt.upload=y:\
:pstmt.upload_dir=/home/appusr1/data/outbound:\  
:pstmt.download=y:\
:pstmt.download_dir=
/home/appusr1/data/inbound:\
:descrip=:


Above you can see that it does indeed have a similar format to the netmap.cfg file, but the meaning is different. Here the first line says any user ("*") from ("@") the remote node name "NICKE" is associated with the following information.  While this might be OK for testing, in practice you should use a specific remote Connect:Direct user.  In a later post I'll go into more detail about that.

When a file is transferred from NICKE to CD.REMOTE the file is stored somewhere on the file system and is owned by some user on the system. In this case the owner of the files will be "appusr1", and the file will be stored under the "/home/appusr1/data" directory.

So you can see that the netmap entry is used to make the connection between Connect:Direct nodes and the userfile entry is used to say who can connect and where the files should live and who they will be owned by.

You can think of the following entry for the local user as default values for the above definition which is often referred to as a "proxy" user definition.
So if you do not override a field in the proxy definition the value will be taken from the local user definition pointed to by the "local.id" field in the proxy definition.
appusr1:\  
:admin.auth=n:
\
:pstmt.copy.ulimit=y:\
:pstmt.upload=y:\
:pstmt.upload_dir=:\
:pstmt.download=y:\
:pstmt.download_dir=:\  
:pstmt.runjob=y:\
:pstmt.runtask=y:\
:pstmt.run_dir=:\
:pstmt.submit_dir=:\
:pstmt.copy=y:\
:cmd.submit=y:\
:cmd.stopnmd=n:\
:cmd.trace=n:\
:cmd.chgproc=n:\

:name=:\
:phone=:\
:descrip=:\
:snode.ovrd=y:


I will not explain all the fields as the "Connect:Direct for UNIX Administration Guide" does a good job of that.

I will say that it is a good idea to lock down the proxy definition with specific upload/download/run/submit directories, and that the local user associated with the proxy definition should not be a privileged user such as root or the user who acts as the administrator of Connect:Direct.

So the next post will describe how to allow transfers in the opposite direction on the Windows Connect:Direct node NICKE.