Mr. Cluey
: Special Topics : Enumerated Data TypesProgrammers often refer to literals, like 7, as Magic Numbers. Usually, we don't care that we are passing a 7, but instead we care about what the data means. This is especially true when dealing with data that represents an abstraction.
For instances, suppose we are testing Micro$oft Confessor, a simple GUI application which allows the user to pick the sins he has committed in order to receive absolution.
Now, in computer terms, a sin is a fairly abstract concept. We could represent a sin in any of a number of ways, but let us suppose, for the moment, that we decide to assign to each sin a number. Each time we need to specify a particular sin, we pass the appropriate number.
Confess(5) ;
testcase Confess( integer sinCrnt ) { ... }
This fits the model described, but is not very good practice. For instance, is it immediately obvious that 5 means Envy? Heck, if you weren't sure what you were looking at, you might think 5 means second degree murder. Besides, the use of the literal makes it very difficult to see the bug here (We meant to pass Envy, but 5 is actually Sloth).
integer Pride = 1 ;
integer Wrath = 2 ;
integer Envy = 3 ;
integer Greed = 4 ;
integer Sloth = 5 ;
integer Lust = 6 ;
integer Gluttony = 7 ;
Confess(Envy) ;
testcase Confess(integer sinCrnt ) { ... }
Considerably improved. Now anybody who picks up this code can tell that we are intending to confess Envy. The fact that Envy is represented by a number is mostly hidden from view.
Unfortunately, this doesn't quite hide the integer implementation that as well as we should want. We might decide later that we want to change the way sins are implementation - that's where is article is going, after all. We don't want to be specifying that Pride is an integer - Pride is a sin.
type SIN is integer ;
SIN Pride = 1 ;
SIN Wrath = 2 ;
SIN Envy = 3 ;
SIN Greed = 4 ;
SIN Sloth = 5 ;
SIN Lust = 6 ;
SIN Gluttony = 7 ;
Confess(Envy) ;
testcase Confess(SIN sinCrnt ) { ... }
Much improved - if we later decide that we want to change what a SIN means, we don't have to rip everything apart - just the type declaration and the values need be changed.
The first downside to this code is that changes to the list of sins involve a lot of work. Eventually, Marketing is going to make it as a sin, probably tucked in between Envy and Greed. So to insert Marketing, you also have to change the other number values. Not a terrible ordeal here. but soon we are going to be working with large data domains.
The remaining downside is that all we have managed to do is hide the integer implementation. If you write
Print(Greed) ;
You get a 4 in the results file. Not very meaningful. Maybe context will be enough to tell you what is going on. Then again, maybe not.
Now, you could easily replace the integers with strings - which do print. Just give each the same value as the name of the variable, and uniqueness is guaranteed. Of course, this does give up on writing beautiful loops like
for sinCrnt = Pride to Gluttony { ... }
Enumerated datatypes have all of the advantages of strings and integers, and some additional benefits that we will take advantage of later. First, let's show the previous example, defining SIN as an enum.
type SIN is enum
{
Pride ,
Wrath ,
Envy ,
Greed ,
Sloth ,
Lust ,
Gluttony
} ;
Confess(Envy) ;
testcase Confess(SIN sinCrnt )
{
//The special examples that we talked about
//This prints Greed, just like the string does,
//without having to do any assignment at all
Print( Greed ) ;
//This loop still works
for sinCrnt = Pride to Gluttony { ... }
//Use a SIN as a key to an array
Prayer[Wrath] ;
//Furthermore, if you get passed the data from
//somewhere else, you can convert it to a sin.
//For instance, this prints Lust
Print([SIN]6) ;
...
}
Furthermore, when the time comes to add Marketing to the list, you just add it in at the correct place. All of the numbers move down by one spot, but as far as the code, and the results, are concerned the change is completely invisible.
The possibilites really are endless, but in my experience, most of them have a single common trigger. Any time you find yourself creating an arbitrary representation of a piece of data, consider whether you would be best served by using an enumerated type.
Uniqueness of the elements is important. The values the enums take on are integer only, so if you need a float representation, your out of luck here.
Think for a moment about how exception codes work - you are seeing an enumeration in action. Did you ever call Button.Click (MOUSE_RIGHT) call? If you want to test different paths for reaching a test state (for instance, Invoking a dialog by picking a menu versus pushing a button), create an enumeration with a value for each option ( INVOKE_BUTTON, INVOKE_MENU ).
Do you programers return error codes in the app that you can look at? With a bit of work, you can get Partner to translate the values for you, report that your library is returning an HTTP_INTERNET_EXCEPTION rather than merely reporting a decimal value that someone will have to look up.
Converting numeric values to their enumerated equivalents is easy to get right most of the time. The cast operator does the trick:
SIN sinCrnt = [SIN]7 ;
There are two cases where the simple cast causes your scripts to break.
When the value that you are attempting to cast isn't represented in the enumerated list, your cast will throw an exception. You need to trap this exception. Well, technically, you don't need to - you can just let your script blow out until you can update your variable list. It is more productive to plan your script so that it continues; obviously, your script is expecting an enumerated value, so you should supply it with an easily identified dummy value.
type SIN is enum
{
NEW_SIN = -666 ;
Pride = 1,
Wrath ,
Envy ,
Greed ,
Sloth ,
Lust ,
Gluttony
} ;
SIN Sin( long lValue )
{
SIN sinCrnt ;
do
sinCrnt = [SIN]lValue ;
except
sinCrnt = NEW_SIN ;
return (sinCrnt) ;
}
You'll note that we've encapsulated the conversion in a single routine - we don't want these do except loops to propogate all over the code. The routine itself is straight forward - try the conversion, if it throws an exception, use the default value instead.
Unlike our previous examples, we've begun assigning values to the enumeration. By default, the first value in an enumeration is 1, and each subsequent value is one more than the previous.
The default value should be chosen carefully, to ensure that it will always be unique. In this simple example, conflicts are unlikely, but longer enumerations may require more care.
Error codes in particular can be a pain in the neck; there are usually a lot of them, there are frequently breaks in the sequence, and negative numbers are fairly popular, which means that the default values are going the wrong direction. In cases like these, it is probably best to just assign each enum value explicitly.
The code fragment above would be lovely if we didn't have to worry about changes in the application we are testing. However, new versions are a regular part of the development cycle. Code that automatically updates itself is nice, when you can manage it - failing that, the code should examine itself to see if it may need maintenance.
We achieve that here by adding a couple of tests into the conversion routine.
SIN Sin( long lValue )
{
// This check makes sure that NEW_SIN is still unique
if ( lValue == [long]NEW_SIN )
LogError( "Object returned NEW_SIN ({lValue}) - time to choose a new value in sin.inc?" ) ;
SIN sinCrnt ;
do
sinCrnt = [SIN]lValue ;
except
sinCrnt = NEW_SIN ;
// Oops, we haven't declared a sin for this value...
if ( sinCrnt == NEW_SIN )
LogError( "Unrecognized sin ( {lValue} ) - is sin.inc up to date?" ) ;
return (sinCrnt) ;
}
Several months from now, the application that you are testing will change. It may break your script; worse yet - it might not. Tests that generate false positives are particularly dangerous. Fortunately, these error messages that you will have forgotten about will jog your memory.
Perfect.
The conversion function above is simple and elegant. Unfortunately, it's also buggy - it will occassionally match an integer to the wrong enumerated value. There is (as of 4.2) a bug in the implementation of enumerated data types.
You would expect that
( MY_SIN == [SIN]lValue )
implies
( [long]MY_SIN == lValue )
but alas it isn't true.
Well, there's no help for it. We'll just have to check that the value we get is correct. Fortunately, we have the ValuesOfEnum function to help us should we not get the right hit the first time.
SIN Sin( long lValue )
{
// This check makes sure that NEW_SIN is still unique
if ( lValue == [long]NEW_SIN )
LogError( "Object returned NEW_SIN ({lValue}) - time to choose a new value in sin.inc?" ) ;
SIN sinCrnt ;
do
sinCrnt = [SIN]lValue ;
except
sinCrnt = NEW_SIN ;
//But even if we get through the conversion without throwing an exception,
//we have no guarantees that Partner has selected the correct value. So confirm
//that the value is correct before exiting.
if ( lValue != [long]sinCrnt )
{
//As we feared, Partner has given us a mismatch.
//Clear the false match, then iterate through all of
//the values of the enum, looking for a real match.
sinCrnt = NEW_SIN ;
SIN sinTemp ;
for each sinTemp in ValuesOfEnum( SIN )
{
if ( lValue == [long]sinTemp )
{
sinCrnt = sinTemp ;
break ;
}
}
}
// Oops, we haven't declared a sin for this value...
if ( sinCrnt == NEW_SIN )
LogError( "Unrecognized sin ( {lValue} ) - is sin.inc up to date?" ) ;
return (sinCrnt) ;
}
Of course, we could have just iterated the list to begin with, which would have sidestepped the bug completely at the cost of a slightly less straight forward algorithm and possibly a performance hit.
| Mr. Cluey : Special Topics : Enumerated Data Types |