Midterm is next week! Second lecture this week will be recap

Data Types (continued)

Enumeration Types

An enumeration type is one in which all of the possible values are listed in the definition. The enumeration constants are typically implicitly assigned integer values.

enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun};

Design Issues:

Coercion = automatic conversion to another data type

  • Is an enumeration constant allowed to appear in more than one type definition?
  • Are enumeration values coerced to integer?
  • Are any other type coerced to an enumeration type?

Examples

Enumeration types increase readability and reliability.

C++ Example

AllowedNot Allowed
myDay = Mon;myDay = 4;
myDay++;
  • A constant can only appear in one enumeration type
  • Enumeration values are coerced to int.
  • No type is coerced to an enumeration type

Ada

  • Constants are allowed to appear in more than one enumeration type
  • No type coercion either way

Java

  • Enumeration types are classes
    • i.e. they can have instance data fields, constructors, and methods
  • No type coercion either way

C#

  • Like C++ but no coercion either way

Subrange Types

A subrange type is a contiguous subsequence of an ordinal type. Subrange types increase readability.

Ada

  • Subtypes are not new types; rather, they are new names for possibly restricted versions of existing types
type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
subtype Weekdays is Days range Mon...Fri;
 
Day1 : Days;
Day2 : Weekdays;
...
Day2 := Day1; // potential run-time error

Array Types

An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate.

  • i.e. arrays have indexes and you get a element by its index in the array

In many languages all elements of an array are required to be of the same type.

Design Issues

  • What types are legal for subscripts?
  • Are subscripting expressions in element references range checked?
  • When are subscript ranges bound?
  • When does array allocation take place?
  • Are jagged or rectangular multidimensional arrays allowed?
  • Can arrays be initialized when they have storage allocated?
  • What kind of slices are allowed?

Arrays and Indices

Arrays are sometimes called finite mappings because they map an index to a value

  • Accessing elements a[i] vs a(i)
    • Most languages other than Fortran and Ada use square brackets
  • Type of subscripts
    • Often a subrange of integers, sometimes of the form 0...n
    • Ada allows any ordinal type as subscripts
  • Most contemporary languages do range checks at run time.
    • Perl returns undef but no error is reported
  • Perl allows negative subscripts, in which case the subscript value is an offset from the end of the array

Array Categories

There are 5 categories of arrays, based on the binding to subscript ranges, the binding to storage, and from where the storage is allocated.

Static Array

  • Array in which the subscript ranges are statically bound, and storage allocation is static

Fixed Stack-dynamic Array

  • Array in which the subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution

Stack-dynamic Array

  • Array in which bot hthe subscript ranges and the storage allocation are dynamically bound at elaboration time (run time)

Fixed Heap-dynamic Array

  • Similar to a fixed stack-dynamic array, in that the subscript ranges and the storage binding are both fixed after storage is allocated. However, both bindings are done during using the heap

Heap-dynamic Array

  • Array in which the binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime

Array Examples

  • Arrays in C and C++ declared using the static modifier are static
  • Arrays in C and C++ declared without the static modifier are fixed stack-dynamic
  • Arrays in Ada can be stack dynamic
  • In C and C++ arrays can be treated as pointers
    • Using malloc and free creates fixed heap-dynamic arrays
  • In Java all arrays are fixed heap-dynamic arrays
  • Perl has heap-dynamic arrays

Array Initialization

Examples:

  • Fortran
Integer, Dimension(3) :: List = (/0, 5, 5/)
  • C, C++, Java, and C# (C-based languages)
int list[] = {4, 5, 7, 83}
  • Ada
Bunch : array (1..5) of Integer := (1 => 17, 3 => 34, others => 0);

Array Operations

  • C based languages do not provide any array operations
    • except library methods (all Java array operations are from libraries)
  • Ada allows array assignments and catenations
  • Fortran includes a number of elemental array operations
    • These operations perform a certain operation on the elements pairwise
  • APL is the most powerful array-processing language ever.
    • + overloaded for numbers, vectors, and matrices
    • ⍨V reverses the elements of
    • ⍨M reverses the columns of
    • ⌽M reverses the rows of
    • ⍉M transposes (its rows become its columns and vice versa)
    • ÷M inverts
    • Many more…

Rectangular and Jagged Arrays

A rectangular array is a multidimensional array in which the rows have the same length, and the columns have the same length.

A jagged array is one in which the length of the rows do not need to be the same.

C, C++, and Java support jagged arrays but not regular arrays.

  • Typical access: myArray[3][7]

Fortran, Ada, and C# support rectangular arrays.

  • Typical access: myArray[3,7]

Slices

A slice of an array is some substructure of that array.

Examples

Python
  • vector[3:6]: subrange
  • mat[1]: one row
Fortran (like Python) plus
  • Mat(:,2): one column

Implementation of Array Types

Single dimension array

ArrayDetails
Element typeelement_size
Index type
Index lower boundlower_bound
Index upper bound
Addressbase_address
address(list[k]) = base_address + (k - lower_bound) * element_size

Multi-dimensional Array

Multidimensional ArrayDetails
Element typeelement_size
Index type
Number of dimensions
Index range 0row_lb, n
Index range n - 1col_lb
Addressbase_address
// recommended for when doing by hand
address(a[i, j]) = base_address + (((i-row_lb) * n) + (j-col_lb)) * element_size;
 
// same calculation, just makes it run faster at runtime
address(a[i, j]) = base_address 
	- (((row_lb * n) + col_lb) * element_size) // can be reused
	+ ((i * n) + j) * element_size; // only part that has to be calculated every time

Record Types

A record is an aggregate of data elements in which individual elements are identified by names and access through offsets from the beginning of the structure.

Records have been part of all of the most popular programming languages, except pre-90 versions of Fortran, since the early 1960s, when they were introduced by COBOL.

In C, C++, and C#, records are supported with the struct data type.

Basically the first step of objects

  • Only difference between the two is that there is no inheritance in a record, but there is in an object

Design Issues

  • What is the syntactic form of references to fields?
  • Are elliptical references allowed?

Definition of Records

COBOL

01 EMPLOYEE-RECORD.
	02 EMPLOYEE-NAME.
	    05 FIRST PICTURE IS X(20).
	    05 Middle PICTURE IS X(10).
		05 LAST PICTURE IS X(20).
	02 HOURLY-RATE PICTURE IS 99V99.

Java, C++

In Java and C#, records can be defined as data classes, with nested records defined as nested classes. Data members of such classes serve as the record fields.

Ada

type Employee_Name_Type is record
	First : String(1..20);
	Middle : String(1..10);
	Last : String(1..20);
end record;
 
type Employee_Record_Type is record
	Employee_Name : Employee_Name_Type;
	Hourly_Rate : Float;
end record;

References to Record Fields

COBOL

Field references have the form field_name OF record_name_1 OF … OF record_name_n

MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-RECORD
  • COBOL allows elliptical references to record fields
  • In an elliptical reference, the field is named, but any or all of the enclosing record names can be omitted, as long as the resulting reference is unambiguous in the referencing environment
FIRST, FIRST OF EMPLOYEE-NAME, FIRST OF EMPLOYEE-RECORD

Ada, C, C++

Employee_Record.Employee_Name.Middle

Implementation of Record Types

FieldNameTypeOffsetAddress
Field 1Name 1Type 1Offset 1Address 1
Field nName nType nOffset nAddress n

Union Types

A union is a type whose variables may store different type values at different times during program execution

In C and C++, the union construct is used to specify union structures. The union in these languages are called free unions, because programmers are allowed complete freedom from type checking in their use.

Design Issues

Are union types typed checked?

Examples

C++/C

union flexType { // can either be allowed to be an int or a float
  int intEl;
  float floatEl;
};
union flexType el1;
float x;
...
el1.intEl = 27;
x = el1.floatEl; // not typed checked

Technically the above example is not allowed in C++, but it will still let you do it

Type checking of unions requires that each union construct include a type indicator. Such an indicator is called a tag, or discriminant, and a union with a discriminant is called a discriminated union.

F# (typechecking)

type intReal =
	| IntValue of int
	| RealValue of float;;
 
let ir1 = IntValue 17;;
let ir2 = RealValue 3.4;;
 
let printType value =
    match value with
	| IntValue value -> printfn "It is an integer"
	| RealValue value -> printfn "It is a float";;

The printType method can be used to check which type(? subtype? the int vs float in type intReal) the parameter is.

Ada (and similar in Pascal)

Type Shape is (Circle, Triangle, Rectangle);
Type Colors is (Red, Green, Blue);
Type Figure (Form : Shape) is
	record
	    Filled : Boolean;
	    Color : Colors;
	    case Form is
			when Circle =>
		        Diameter : Float;
	        when Triangle =>
	            Left_Side : Integer;
	            Right_Side : Integer;
	            Angle : Float;
	        when Rectangle =>
	            Side_1 : Integer
	            Side_2 : Integer;
		end case;
	end record;

End of lecture 1

Pointer and Reference Types

A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference a memory cell.

A value of a reference type is a reference to another value.

Design Issues

  • What is the scope and lifetime of a pointer variable?
  • What is the life time of a heap-dynamic varialbe (the value a pointer references)?
  • Are pointers restricted as to the type of value to which they can point?
  • Are pointers used for dynamic storage management, direct addressing, or both?
  • Should the language support pointer types, reference types, or both?
    • In Java, we have only references, no pointers

Pointer Operations

Languages that provide a pointer type usually includes two fundamental pointer operations: assignment and dereferencing.

Dereferencing of pointers can be either explicit or implicit.

In C++, it is explicitly specified with the asterisk (*) as a prefix unary operator

j = *ptr
graph TD
    ptr["ptr"] --> |"7080"| dynamicBlock["7080<br>An anonymous dynamic variable"]
    dynamicBlock --> |"206"| valueBlock["j"]

Pointer Problems

Dangling Pointers

int * arrayPtr1;
int * arrayPtr2 = new int[100];
arrayPtr1 = arrayPtr2;
delete [] arrayPtr2;
// Now, arrayPtr1 is dangling, because the heap storage
// to which it was pointing has been deallocated.

Lost Heap-Dynamic Variables

A lost heap-dynamic variable is an allocated heap-dynamic variable that is no longer accessible to the user program (garbage).

Reference Types

A reference type variable is similar to a pointer, with one important and fundamental difference: A pointer refers to an address in memory, while a reference refers to an object or value in memory.

  • it is not sensible ot do arithmetic on references

C++

int result = 0;
int &ref_result = result;
// ...
ref_result = 100;
  • Java reference variables can be assigned to refer to different class instances; they are not constants. All Java class instances are referenced by reference variables.

Evaluation

  • Problem of dangling pointers
  • Problem of garbage
  • Problems of heap management
    • Allocation
    • Garbage detection
    • Deallocation
  • Pointers have been compared with the goto

"Their introduction into high-level languages has been a step backward from which we may never recover" Hoare (1973)

  • Pointers are essential in some kinds of programming applications
  • The references of Java and C# provide some of the flexibility and the capabilities of pointers, without the hazards

Type Checking

Type checking is the activity of ensuring that the operands of an operator are of compatible types.

  • Any automatic conversion is called a coercion

A type error is the application of an operator to an operand of an inappropriate type

  • If all bindings of variables to types are static in a language, then type checking can nearly always be done statically
  • Dynamic type binding requires type checking at run time, which is called dynamic type checking

Strong Typing

A programming language is strongly typed if type errors are always detected. This requires that the types of all operands can be determined, either at compile time or at run time.

  • C and C++ are not strongly types languages because both include union types, which are not type checked
  • ML and F# are strongly typed
  • Ada is nearly strongly typed, i.e., it allows programmers to breach the type checking rules by specifically requesting that type checking is suspended for a particular type conversion
  • Java and C# are nearly strongly typed
  • JavaScript is dynamically typed/weakly typed

Type Equivalence

Type equivalence is a strict form of type compatibility; compatibility without coercion.

Name type equivalence

Two variables have equivalent types if they are defined either in the same declaration or in declarations that use the same type name.

Structure type equivalence

Two variables have equivalent types if their types have identical structures

type Celsius = Float;
type Fahrenheit = Float; 

// vs
type Celsius is new Float;
type Fahrenheit is new Float;

Everything up until this point will be on the lecture during week 5