ZUB - NNTP newsgroup scan/download

The ZUB tool allows groups on a news server to be scanned for UUENCODED, yENCoded, and some Mime Encoded files. These files are down loaded and saved in a directory specified via the program. The news server must be accessible via NNTP. Files posted in multiple articles are assembled and decoded in the appropriate order.

Screen Shot of Main Dialog

Link to:

Installation Notes

Getting Started

You have fired up Zub and have pressed the Help button to get to this HTML page. This section gives you a step by step guide to running the program.

  1. Press the Server button button to bring up the server properties dialog. Enter the name of the news server, for example: news.myisp.net. (The previous name will not actually work as there is no such server, you use the news server supplied by your ISP. If your news server requires a userid and password, then enter this data as well. The news server port defaults to 119, which is the standard port. If you are using a non-standard news server, you may need to change this. The list of radio buttons in the Select News Server area at the right side of the dialog allows you to save data for multiple news servers. Whichever radio button is selected is the one the Server, Userid, Password, and Port fields apply to. To select a new server, select that radio button and fill in the fields. This is useful if your ISP provides a news service and you also subscribe to some other news server.
     
  2. Now you need to select how many news headers you want to look at. The radio buttons in the grouping "New Messages Since" allows you to specify this. You can look at all the messages the server has available, all the messages posted since a particular date, all the messages since the last time the program was run (not sensitive to what groups were read), or the last n messages. Lets say you just want to look at the newest 6000 messages in each news group you will process. Check the Last 'n' Articles" radio button and fill in the number 6000 in the field to the right of the button. If there are less than 6000 articles currently on the news server, all the articles available will be processed.
     
  3. When you press OK, the dialog will close and the data will be saved in the zub.ini file in your c:\windows directory or whereever your version of Windows stores .ini files.
     
  4. NNTP Servers have several ways of deciding to let you access the news. You have to figure out which one to use.
     
  5. Press the Groups button button to specify what groups to search. Lets suppose you wish to search groups alt.binaries.multimedia, alt.binaries.mp3, and alt.binaries.pictures.motorcycles. You also will want to look at alt.binaries.pictures.military, but not right away. The Groups dialog has a large display area for groups and a column of buttons. Press the top button, titled Add to bring up the add dialog type in the name "alt.binaries.multimedia", leave the active box checked and press OK. The dialog will close and the name will appear in the display area. There will be an the word ACTIVE: to the left of the name to indicate the group is active and will be searched. Repeat the process for alt.binaries.mp3 and alt.binaries.pictures.motorcycles. Use the Add button to add the group alt.binaries.pictures.military but UN-check the "Active" check-box. When you press OK, you will notice that instead of the word ACTIVE: to the left of the name, there is the word INACT_:, indicating the group is not going to be searched.
     
  6. Let's assume at this point you don't want to mix the files from the mp3 newsgroup with the multimedia news group so you don't want to use this group right away. In the display area, click the line with the mp3 newsgroup on it and it will highlight. Press the Deactivate button and the ACTIVE: will change to INACT_: showing the group is no longer going to be searched when you run the download. The Activate button will reverse this process. Note that the list is sorted with the active groups first. It is quite common to process one group per run. Press OK to return to the main dialog and save this information in the registry.
     
  7. Lets assume you want to find M-PEG movies, AVI movies, and RealAudio(tm) files. Check the boxes for MPG and AVI files and type .ra in the "User Suffix 1" field. The searches are case insensitive so it capitalization does not matter.
     
  8. Suppose you want to start the job and see what files there are out there to download. Then you want to start the files down-loading and go to sleep. Check the box marked Trim List After Analysis. Do not check the box Retrim File list After Download. If you are using a normal dial up connection to the Internet the Disconnect Dialup Connection After Run check box will be enabled. Since you want to go to sleep, you will want the computer to shut down the dialup connection when the download is complete. If you are using a cable modem or DSL connection or a LAN connection, you would not want to check this box as those types of connections stay up all the time.
     
  9. In the Target Dir field, type the name of an existing directory, for example: C:\tmp or use the Browse button to the right of the field to search for and select a directory. This field will be saved in the registry when you press the Exit button.
     
  10. The Message Level field has valid values of 0 through 3. The value zero gives minimal messages and does not produce a log file. The value 3 spits out everything which is going on. A good value to use is 0 unless you wish to have a log of what was going on. In this case, 1 is a good value.
     
  11. Press the Run button. The newsgroups will be processed. You will see output in the transcript area and the progress indicator at the bottom of the dialog window. In addition, a copy of the transcript pad will be placed in the Target Directory with the name nzublog.txt. For each group: First all the news headers in the group are read and processed. For files such as MP3 and MPG, which are separated into multiple pieces, a check is made to see if all the news articles for this file are available on the server. If this is the case, the file is considered to be available for download. Single part files, like JPG and GIF are always all there and are thus available for download if the check box for this file type was checked.
     
  12. After the list of headers is analyzed, the list of files available for download is shown in the Trim File List dialog. You can use the mouse button to select files in this list and then either keep the selected files or keep all but the selected files. This process can be used to pare the list down to just the files you want to download. If when you press the "OK" button, the files will begin to be down loaded and placed in the directory you specified in the Target Dir field. Note that the name from the header may or may not match the name of the file specified in the body of the news article.
     

Main Dialog Elements

The objects on the main dialog are as follows:

Server Button

The Server button brings up a dialog which allows specification of the NNTP server, the userid and password to authenticate to the server and the port on which to contact the server.

Groups Button

The Groups button brings up a dialog which allows specification of the news groups to be examined. The groups are added via an "Add" button and are of the form name.name.name. For example: alt.binaries.multimedia There is no browse newsgroups currently available.

Cache Button

The Cache button brings up a dialog which allows specification of the local caching options. Rather than reading all the headers each run, zub can store the headers locally and just add new ones and eliminate expired ones. How many to keep and where to keep them is determined by the dialog brought up by this button.

Run Button

This button starts the download and scan process. It disables the buttons which do not make sense to change while the server is running.

Stop Button

This button is only enabled while a download is in progress. Pressing stop will stop the download. You may then confirm the termination or continue based on a pop up confirmation box.

Exit Button

The exit button quits the program unless a run is in progress in which case the button is disabled. The windows close button will terminate the process and close the window. The target directory and file type check boxes, message level, and user suffix types will only be saved when the exit button is pressed.

Help Button

Brings up this help file.

Re-Trim Button

This button is only active while files are being download. If the headers are being down loaded for analysis or a run is not in progress, the button is not active. This button's functionality is related to the Trim File List After Analysis check box. While files are being down loaded, you may wish to go back and review and/or change the list of files to download. This button brings up the same dialog panel as the check box does. The difference is that the check box brings up the dialog panel when analysis of the news article headers is done. The button brings up the list when you press it. Note that if you cancel the download of the file currently being down loaded, the news article currently being down loaded must be finished before the cancel can occur.

MPG Check Box

Finds .mpg files. These are generally movie clips. The subject line must have .mpg in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

AVI Check Box

Finds .avi files. These are generally movie clips. They are similar to MPeg files, but are compressed better. The subject line must have .avi in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

MOV Check Box

Finds .mov files. These are generally Quicktime movie clips. The subject line must have .mov in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

WMV Check Box

Finds .wmw files. These are a proprietary Micorsoft format similar to mpg and usually contain movie clips The subject line must have .wmv in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

MP3 Check Box

Finds .mp3 files. These are MPEG layer 3 files and contain music or other sounds. The subject line must have .mp3 in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

WAV Check Box

Finds .wav files. These files are generally music or other sounds. The subject line must have .wav in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

WMA Check Box

Finds .wma files. These are a proprietary Micorsoft format similar to mp3 and usually contain music or other sounds. The subject line must have .wma in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

OOG Check Box

Finds .oog files. These are an open source format similar to mp3 and usually contain music or other sounds. Many Linux computers use this format. The subject line must have .oog in it and it must have a trailer showing which part this is out of how many. For example: [12/34] or (12/34). The searches are not case sensitive.

JPG Check Box

Finds .jpg files. These are generally pictures. The subject line must have .jpg in it. It and does not have to have a part identifier, although if it has one, it is processed. The searches are not case sensitive.

GIF Check Box

Finds .gif files. These are generally pictures or drawings. The subject line must have .gif in it. It and does not have to have a part identifier, although if it has one, it is processed. The searches are not case sensitive.

ZIP Check Box

Finds .zip files. These can be just about anything. The subject line must have .zip in it. It and does not have to have a part identifier, although if it has one, it is processed. The searches are not case sensitive.

RAR Check Box

Finds files with suffixes of .rar, .r**, .p**, .s**, and .slv. These files can be processed by the Winrar program. The ** in .r**, .p**, and .s** are two numbers as in .r01, .r02, etc. The searches are not case sensitive. Note that Zub collects the pieces. Use Winrar to combine the pieces.

User Suffix 1 Field

This field specified some other type of file to search for, such as .gif or .exe files. The searches are not case sensitive. The files may or may not have the trailing part/ofparts on the subject line.

User Suffix 2 Field

Same as User Suffix 1

Trim File List After Analysis Check Box

The zub program first looks at all the header files to find the list of files it can download. These are the files it can find all the pieces for. After this list is generated, you may want to review this list and trim it. If this check box is checked, the program will pause and run a dialog to allow you to review and trim the list. After the dialog is closed, down loading of the files which were not trimmed out begins. If you do not want the program to pause, do not check this box. The program will then download everything it can. Note that if a file to be down loaded file exists in the target directory, the file is not down loaded. However, the first article for the file has already been down loaded when this is detected. This is because the name found in the Subject: line may be different that what actually appears in the file specification.

Retrim File List After Download Check Box

This check box is subordinate to the Trim File List After Analysis. Say you are looking through the list of songs to download and decide, "I want to download this one first and then go back and look at the other titles." Checking this box causes the list of files to download (minus any already down loaded) to be displayed after the down loading of the files is complete.

Delete Partial File From Failed Download Check Box

This check box tells Zub how to deal with the case where a file is being downloaded and a failure occurs reading the one of the articles which are combined to form the desired file. If the check box is set, the part of the file which was already downloaded is deleted.

This situating can occur in a number of ways. One is the connection to the news server is lost for some reason and it cannot be reestablished. A second is that a required article is missing. In theory, this should not happen as Zub checks to see that articles exist for all the pieces needed to build a file. What actually happens is that the headers for all the news articles are cached in an index file on the news server. When Zub requests the list of headers, part or all of this cached list is sent. This list is not always up to date with the actual list of unexpired articles. The most common reason for not being up to date involves articles posted to multiple groups. Most news servers are smart enough to save one copy of the article and create soft links to the copy in the other groups in which the article is posted. It is possible for the article to be expired from the base group before the other groups resulting in a broken soft link. The article appears to be there till you try to access it.

After Completing Download Run - Disconnect Dialup Connection Check Box

If this box is checked, when the run is complete, any dialup connections are terminated. This is useful if the run will extend into the night and you will be asleep when the run completes. Note that if the dialup DLLs are not installed on your machine, this option is disabled. This generally means you are connected to the Internet via a LAN. If you are connected via a DSL or cable modem, the check box may be enabled but will not disconnect the connection to the internet as this connections are permanent.

After Completing Download Run - Shutdown Computer Check Box

If this box is checked, when the run is complete, the computer is shut down. This is useful if the run will extend into the night and you will be asleep when the run completes. If you have an ATX motherboard, the machine will be powered off. Otherwise it will be left in the "You May Now Turn Off Your Computer" state.

Join Binary Split Check Box

Some downloadable files, especially .mpg files get posted as single articles which must be joined after being converted from UUENCODE format to the original binary format. They tend to have names like abc.mpg.001, abc.mpg.002, and so on. After download they can be joined to form abc.mpg. If this box is checked, zub will attempt to find files of this format and join them. If the box is not checked, the individual abc.mpg.001 type files will be downloaded and you can join them later.

Unlike single files which are split across multiple news articles, zub does not know exactly how many pieces there should be. As a result, it takes the highest numbered piece and considers this to be the number of pieces. If all the pieces of the file from 001 to this number are available, the set is assumed to be complete. If your news server does not keep a large number of articles retained, you may have to leave this box unchecked and download the pieces of the file over several days and join them yourself. The other alternative is to get yourself an account at a large news service like Usenet News Server which retains lots of articles in each newsgroup.

Another place you might want to turn this checkbox off is in processing the end pieces only of a long .mpg file. For example, say with the checkbox on, you download and combine the first 60 pieces of a file which you later discover has 200 pieces. You would like to download the rest of the file and add it to what you have already downloaded. Turn the check box off, download all additional individual binary files (abc.mpg.061, abc.mpg.062, ...), and use the DOS copy command in a .BAT file to combine them. The format of the copy command to append the extra pieces is:
copy/b "abc.mpg" + "abc.mpg.061"
copy/b "abc.mpg" + "abc.mpg.062"
and so on.

Target Dir Field

This is the directory where the collected files will be stored. If a file is going to be down loaded and it already exists in the directory, it is not copied. In addition, a copy of the transcript log is placed in this directory with the name nzublog.txt. Two copies of the log are maintained. The -1 copy is the log from the previous run. The specified target directory must exist.

Target Dir Browse Button

This button brings up a find folder dialog which allows you to specify the folder to save into.

Message Level

This numeric field has valid values of 0 through 4. Zero generates only error messages. One is the standard listing. Two and Three dump most of the dialog which goes on between the application and the NNTP server with 3 dumping more detailed information. Message level 4 dumps the same data as 3 but also saves each of the article bodies downloaded to create the file. This is useful if some song you want does not seem to download correctly. You can look at the pieces and put it together by hand. This is also useful if you want to contact the author about a file which will not download correctly.

Transcript Pad

Messages are written to the transcript pad as the download progresses. The messages are also written to nzublog.txt in the Target Directory.

Progress Indicator

This indicator marks the progress of each group being processed and each file being down loaded.

Program Issues (Anomalies, Bugs, Etc.)

The following lists known problems and future enhancements for Zub.

  1. Mime B64 encoding is supported, but if the mime headers are convoluted enough, the file may not properly be decoded. Zub will read all the news articles, but never recognize as start point.
  2. Binary split files where each part is held in multiple news articles are not assembled. If a file is split with a binary split program and each of the pieces are split across multiple news articles, the binary pieces are not reassembled. There are variations in the way these pieces get posted which make the assembly process unreliable. This is situation addressed by not attempting to assemble the binary pieces. The pieces get stored as abc.mpg.001, abc.mpg.002, etc.
  3. Workarounds: If you set the message level to 4 on the main dialog before you begin downloading, Zub will make a copy of each news article which is downloaded. These will be created in the target directory. You can then use other methods to extract the data.