Code review and programming practices

This is an example of the type of help Nimcha Consulting can provide. Programming mentoring and helping do job optimization requires code review. And doing code review has different approaches. On one hand we can recommend changes that purely effect speed and efficiency; and on the other hand we can recommend changes to make code easier to turn over to less experienced programmers. Just because you can code some tricky algorithm doesn't mean you should.

The following is a repgen that a CU wrote and had trouble with. They were trying to pull a transaction count by share and loan types for branch 01 and they were not getting any results.

Click here to view source

Issues and other discussion points:

1) The CU stated they were not getting any output. Taking a quick look through the code shows no PRINT or COL statements anywhere but the TOTAL division. The printing in the TOTAL division was conditional on the Share/Loan Tran count being greater than 0. Since they were not getting output the count is never greater than 0. Looking back at the PRINT division shows that the count is incremented based on INDEXS/INDEXL.

Tracking INDEXS/INDEXL you can see the problem was in lines 31-39. Because INDEXS/INDEXL was used in the FOR loops, upon exiting the loop the values would be 100. That places it outside the range of the array they were storing the information. Total time to find (but not fix)the bug was less than 60 seconds!

2) One fix could be to initialize INDEXS/INDEXL after lines 31-39. But it is better to evaluate these lines and then change what they do. The purpose of these lines is to load the Share/Loan description into an Array so that it can be used in the TOTAL division. Doing it as written in the PRINT division under a TARGET of ACCOUNT causes the Repgen to iterate through Parameters and load the array with the same information for every single ACCOUNT that passes the SELECT division. Needless to say this is a waste of CPU. Since the data is used only once it should be moved to either the SETUP or TOTAL division.

To fix this then we need to weigh the options. Do we go for speed optimization or for code readability and flexibility? There are basically two choices in fixing this. Preload the array and reference it later. Or just reference the parameters on the fly. Preloading is very quick but does take up a bit of memory and does involve code someone needs to "maintain". Referencing the parameters on the fly uses no memory and the code should be very easy to understand. However, referencing on the fly does mean the data is "moved" for each item. So if this is just a small number then the time to do it is small. But if the number of items could be large then you might not want to do this.

In this particular repgen the number is small enough that there are no speed concerns. So really the choices just come down to aesthetics. Do you need to ever reference the data before the TOTAL division? Will this code ever be maintained by inexperienced programmers?

3) Lines 31-39 are also inefficient as well as redundant. In addition to moving them to the SETUP division they should be rewritten. There are two separate loops being used to pull the information from Parameters. This can be rewritten to use only one loop. Since using INDEXS/INDEXL is completely arbitrary and indeed is the source of the original "bug", it would be better to use some other variable to loop on. Additionally, even though there is no code advantage for doing this it is good practice to not use variable names for multiple things. One should be able to read a variable name and know exactly what it is used for. So for example in this loop I would use something more generic:
31:  FOR XXX = 0 TO 99
32:   DO
33:    SHAREDESC(XXX)=GETDATACHARACTER(GETDEFAULTSHARE,XXX,55)
34:    LOANDESC(XXX)=GETDATACHARACTER(GETDEFAULTLOAN,XXX,77)
35:   END					    
     
The time savings of making one loop instead of two is not much but the point is that it could be written more efficiently. This is a useful change for future programmers. Also, even with the small CPU usage decrease it has always been my opinion that a hundred such changes will possibly be noticeable in huge time consuming operations like Month End.

4) Lines 28-29 set the INDEXS and INDEXL to the respective Share/Loan types. But they are not actually on a Share or Loan at the moment so the data is invalid. These lines need to be moved into the actual SHARE/LOAN loops in order to be set correctly.

5) Lines 22-24 The SELECT division is problematic because it will exclude results the CU is actually asking for. The stated purpose of the Repgen is to calculate the number of transactions at a Branch during a particular time frame. The SELECT division excludes closed Accounts so any Account that is closed at run time but that had Transactions at the Branch during the period being counted will be missed. It should be changed to include Accounts opened during the period regardless of current status.

6) Lines 41 and 54 Similar to number 5, the exclusion of closed Shares/Loans will cause Transactions to be missed. It should include Shares/Loans opened during the period.

7) Lines 11-14 The declared Array size is way to big and is using memory needlessly. I think they had to set it to 999 instead of 99 because of the original bug. Since INDEXS/INDEXL was accidentally incremented to 100 they probably needed to increment the array from 99 to something else in order to not get an "Array out of Bounds" error. They should have been able to use 100 but they chose 999 instead. Essentially they are now using something like 270K of memory just to store just a tiny bit of information.