From: IN%"POSTMASTER@EMBL.BITNET" "General PostMaster" 7-FEB-1990 17:23:30.49 To: HARPER@cc.Helsinki.FI CC: Subj: Automatic response to : GET SOFTWARE:BIOBIT.6 Received: from jnet-daemon by cc.Helsinki.FI; Wed, 7 Feb 90 17:22 EET DST Received: From EMBL(NETSERV) by FINUHB with Jnet id 5823 for HARPER@FINUH; Wed, 7 Feb 90 17:22 O Date: Wed, 07 Feb 90 16:05:45 From: EMBL Network File Server Subject: Automatic response to : GET SOFTWARE:BIOBIT.6 To: HARPER@cc.Helsinki.FI Reply-to: General PostMaster Organisation: European Molecular Biology Laboratory Postal-address: Meyerhofstrasse 1, 6900 Heidelberg, W. Germany 6666666666 666 6666666666 6666666666 666 6666666666666 6666666666 6666666666 6666666666 666 666 666 666 666 666 666 666 666 666 666666666 666 666 666 666666666 666 666 666666666 666 666 666 666666666 666 666 666 666 666 666 666 666 666 666 666 6666666666 666 6666666666 6666666666 666 666 6666666666 666 6666666666 6666666666 666 666 No 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% << EDITED BY ROBERT HARPER >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% THE MYSTERIES OF ARC AND UUDECODE REVEALED This issue of BIOBIT deals with the use of two sets of programmes, namely programmes for ARCING/UNARCING and UUENCODING/UUDECODING. But first a bit imagery. Lets imagine a jack-in-the-box. The basic mechanism is a spring which has been compressed, and when you open the box the puppet pops out. Well that is what the ARC programme does. It allows you to take a file and compress it to a much smaller size than the original, and ship it from one place to another and then uncompress it so that it pops out to its original size. The obvious applications are large documents, or spreadsheets. Some ARCing programmes can compress text files (.DOC or .TXT) to 50% of their original size. Whereas the best compression for programmes (.EXE or .COM) is generally about 30-40% depending on the type of ARCing programme that you use. Archiving is a very "hot" topic at the moment. On EARN/BITNET the standard is the ARC format. On USENET there is a debate going on at the moment whether or not to change from ARC to ZOO because ZOO is available for a variety of different operating systems (MSDOS, UNIX, VMS) and it would be nice to have an archiving system that is compatible on many different systems. And then there is a new archiver called PKZIP which gives very good compression percentages as well as being very fast at extracting files from .ZIP files. The two main things that people look for when they use a ARCing programme are percentage compression and speed of extraction. PKZIP performs these two activities very well and this is why it has an enthusiastic following. It is hard to think of an analogy for UUENCODING. Anyway here goes. Let's compare a computer programme to an enzyme. The enzyme usually performs a specific task. So does a programme. An enzyme has a certain optimal temperature and pH before it works. If you don't handle it properly it could become inactive and useless. Well that is what happens when you try and send programmes over the network. They get trashed at the various gateways and when they arrive, due to the rough handling they have had they "loose all their activity", and they do not work. (EBCDIC->ASCII translation ;-) ) Now an enzyme is only a certain arrangement of amino acids, and the original code for them is to be found in a particular DNA sequence. So in theory if you know the original DNA sequence you should be able to make the enzyme from it... basic protein engineering. Well UUENCODE takes a programme (.EXE or COM) and translates it into a code. This code is pure ASCII and it can be safely transfered over the network. When it arrives at its destination it is UUDECODED to produce the original programme once again... a sort of reverse engineering. It takes a series of three neucleotides to code for one amino acid, so the DNA code sequence is larger than the code sequence for the enzyme. The same is true for UUENCODED files. They are always larger than the original programme. This is the reason that UUENCODE and ARC are employed in conjunction with each other. ARC is used to make the file as small as possible before the UUENCODING takes place. So now you know why this archiving and uuencoding are taking place. Simple isn't it? So lets now examine the steps involved getting a programme from BIONET to Helsinki. At BIONET the programmes are ARCed to make them as small as possible and then UUENCODED. When I get them to Helsinki and I reverse the process. For historical reasons we will be looking at ARC v5.12 which was the first popular archiver. It has many different options but the three most often used options used with ARC are A for ADD and V for verbose and X for extract. At BIONET two files are added into the archive PLASMIDC.ARC by giving the following commands 1) a) ARC A PLASMIDC.ARC PLASMID.EXE b) ARC A PLASMIDC.ARC PAINT.BAT The resulting PLASMIDC.ARC file is then UUENCODED with the following command. 2 ) UUENCODE PLASMIDC.ARC PLASMIDC.UUE The files are then brought to Helsinki by FTP and downloaded to a PC using the KERMIT protocol. Once the file are on the micro, the procedure is reversed. The UUDECODE programme looks at the the header of the .UUE file and decodes it to give the .ARC file. 1) C:\>uudecode plasmidc.uue Decoding plasmidc.uue Destination is plasmidc.arc The PLASMIDC.ARC file is then examined with the VERBOSE option just to see what it contains. As you can see it contains the 2 files and the EXE file has been compressed by 16% C:\>arc v plasmidc.arc Name Length Stowage SF Size now Date Time CRC ============ ======== ======== ==== ======== ========= ====== ==== PAINT.BAT 21 -- 0% 21 15 Aug 87 6:55p 8674 PLASMIDC.EXE 62754 Crunched 16% 52951 15 Aug 87 6:49p 8250 ==== ======== ==== ======== Total 2 62775 16% 52972 The remaining step is now to EXTRACT the files with the X option and then boot the programme on the micro, and after that you should be able to draw simple plasmids. 2) C:\>arc x plasmidc.arc Extracting file: PAINT.BAT Extracting file: PLASMIDC.EXE There are many different UUDECODE/UUENCODE programmes to be had, and ARC comes in many different flavours as well. Find a set of programmes that work well for your system and then stick to them. One STARTER KIT which I can recommend is called UseNET CBIP Starter kit. For anyone interested you can get it by FTP from swan.ulowell.edu (129.63.224.1), file ibmpc/General/starter.kit The file is too large to put into BIOBIT for it is about 800 lines, but here is a short summary of the contents. %%%%%%%%%%%%%%%%%%%%%%%%%%% EXTRACT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UseNet CBIP Starter's Kit The files contained herein are public domain, with the exception of ARC-E, (c) Copyrighted by Wayne Chin and Vernon D. Buerg. ALL RIGHTS RESERVED. This kit contains what you will need to begin downloading files from comp.binaries.ibm.pc, or from various archive sites. This kit contains: 1) Instructions 2) BASIC source for UUDECODE 3) Pascal source for UUDECODE 4) C source for UUDECODE 5) ARC-E 3.1C, in uuencoded form You will need: 1) Pascal or C compiler or BASIC Interpreter 2) File editor What to do: You will need to split this file into 5 parts. Each part is separated by a line stating "---CUT HERE---" and a short description. Using a text editor, separate the parts for the Pascal source, the C source, and the UUEncoded ARC-E program. Then compile one of the sources to create an executable version of uudecode, and then run it on the ARC-E file. If you called the ARC-E file arce.uue, then type: uudecode arce.uue This will give you ARC-E.COM, an executable ARC file extractor. Type 'arc-e' for options. This file will allow you to extract all other ARC files. %%%%%%%%%%%%%%%%%%%%%%% END %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Rob "that wasn't too painful was it?" Harper