Next: Objects Introduction Up: C++ for Ocean Modeling Previous: Introduction

Abstract Data Types

Both C and C++ (and now Fortran 90) permit abstract data types. While I don't want to embed details of how the languages do this, it will be helpful to include illustrations, so I'll use a pseudo-language. From the example already alluded to, let's create a buoy report data type:

type buoyreport {
  real lat, long;
  real t_air, pressure, t_sea, u_wind, v_wind;
  integer qc_code, platform_type;
  character name[60];
  integer yy, mm, dd, hh;
}

In other words, we've declared that there is something called a buoy report, and that every buoy report has certain data associated with it. Inside the program, we can then declare

 buoyreport x[1000];

exactly as we might say REAL x(1000) to declare an array of 1000 real numbers. In this case, we've declared an array of 1000 buoy reports.

This is only a start on using abstract data types, but let's see what we can gain by it. First, we can do windowing more readily, e.g.

if (abs(x[i].lat - lat_ref) < 0.2 AND abs(x[i].lon - lon_ref)  ) then
C  process data for being in range
endif

Note that elements of a buoyreport are referred to by giving the name of the variable (x) dot element name. The above is more flexible and readable than having separate arrays for latitude, longitude etc. (the Fortran 77 alternative). This is not a tremendous savings.

But now consider the processing that we would do. This likely will be done in a subroutine. In F77, we would have to give an argument list to the subroutine like SBR(lat, lon, temp, etc. etc. etc.). And since the subroutine may not be in the same file as the calling routine, we get the added joy of keeping two separate codes synchronized. The joy increases when we've got multiple sorts of buoys, each with slightly different data elements. By declaring the abstract data type, we avoid all of that. The subroutine is a subroutine that takes a buoyreport as an argument. There's nothing extra to synchronize between the codes as we've defined what buoyreports are. If we later add different types of reports, say to separate cman_reports, drifter_reports, etc., we only have to define what these entities are (though in C and F-90, each of these must be declared in full separately. C++ permits us to create a new type by saying that it is just like an old type except that it also has ... whatever.)

There's an additional level of utility in the abstract data types. That is, an abstract data type can include data elements which are themselves abstract data types. In the buoyreport example, I had yy, mm, dd as data elements. This is to record the time of the observation. It would be more convenient to simply say date. We can do exactly that - declare a data type called 'date' and then make observation_time a variable in the buoy report. Now, when we write our function to do a time difference check, we can call time_diff(time1, time2, delta), where time1 and time2 are the observation time, reference time, and the time difference (we may choose to make that in hours, or perhaps to make it a 'date' type variable itself).

This nesting of abstract types permits a very natural construction of the programming. For my pre-BUFR SSMI - SDR data processing, for instance, I have a several level nesting, each level of which makes internal sense. The set up was that there's an orbit of data. Each orbit of data has an sdr header followed by a number of data records. (SDR_HEADER, and DATA_RECORD are then abstract data types we define). Each data record includes a scan_header and a data_block. The scan header includes various elementary data pieces which we look up, declare, and are finished with. The data_block includes some information on the mode of the satellite, and then 64 pieces of 'long data'. Long data turns out to be latitude-longitude of observation, surface type, position within the scan line, 7 antenna temperatures, and three pieces of 'short data'. Short data is latitude-longitude of observation, surface type, and two 85 Ghz antenna temperatures.

We can now work at whatever level we're interested in. If we want the data that characterize the orbit itself, they're available in the sdr_header. If we want to know about the scan line, it's in the scan_header. If we're looking for 19v antenna temperatures, those are in the long_data, and so on. The great virtue here is that we don't need all at once to know how to locate all pieces of information. Further, we aren't restricted from changing some of the pieces. That is, we could discover that long_data included a quality control flag (or we decide to add one ourselves). Rather than rewriting every piece of code we have, we merely change the definition of long_data, and only where we use or change this new bit, add some code.

In the buoy report example, I listed the time of observation as being year, month, day, hour. Suppose now we discovered that the buoys reported time to the minute. In the F-77 case, we have to add an argument to all our subroutines (in the right order, with the right declarations within our called routines, etc.). With abstract data types, there is no change to any routines except internal to the few routines that compute things with dates. Every other routine merely refers to a date. They don't care exactly what a 'date' is, but if it matters, they can correctly pass a date along to the routine (like time_diff) which does. Only in the interior of time_diff do we have to rewrite any programming. (Aside - note that there is still some work to be done. The newer programming capabilities don't get rid of that fact. What they do, as here, is to let us make the changes in the fewest possible places.)

Abstract data types are definitely good things, in the right cases. A bad case is when the problem really does consist of only a few things which are elementary data types. One example would be the dynamics section of a CFD model. We have arrays of temperature, pressure, velocity, We could construct an abstract data type:

type dynamics {
  real temperature(360, 180), salt(360, 180), u(360, 180), v(360, 180); 
  real pressure(360, 180); 
}
dynamics model;
  or
type dynamics {
  real temperature, salt, u, v, pressure;
}
dynamics model(360, 180);

but it would be hard to argue that we make the code any clearer by doing this. Abstract data types are a tool, not necessarily the solution.

Next: Objects Introduction Up: C++ for Ocean Modeling Previous: Introduction

Robert Grumbine
2000-06-14