Francais | English | Espanõl

Subroutine

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computer science, a subroutine (function, method, procedure, or subprogram) is a portion of code within a larger program, which performs a specific task and is relatively independent of the remaining code. The syntax of many programming languages includes support for creating self contained subroutines.

There are many advantages to breaking a program up into subroutines, including:

  • reducing the redundancy in a program (ie, by replicating useful functionality, such as mathematical functions),
  • enabling reuse of code across multiple programs,
  • decomposing complex problems into simpler pieces (this is claimed to improve maintainability and ease of extension),
  • improving readability of a program,
  • hiding or regulating part of the program (see Information hiding)

The components of a subroutine may include:

  • a body of code to be executed when the subroutine is called
  • parameters that are passed to the subroutine from the point where it is called
  • a value that is returned to the point where the call occurs

Some programming languages, like Pascal and FORTRAN, distinguish between functions, which return values, and subroutines or procedures, which do not. Other languages, like C and LISP, do not make this distinction, and treat those terms as synonymous. The name method is commonly used in connection with object-oriented programming, specifically for subroutines that are part of objects.

Maurice Wilkes, Stanley Gill, and David Wheeler are credited with the invention of the subroutine (which they referred to as the closed subroutine).

Contents

[edit] History

The first use of subprograms was in assembly languages that did not have a call instruction. On these computers, subroutines needed to be called by a sequence of lower level instructions, possibly implemented as a macro. These instructions typically modified the program code rather like the infamous COBOL alter statement, modifying the address of a branch at a standard location so that it behaved like an explicit return instruction. Even with this cumbersome approach subroutines proved so useful that soon most architectures provided instructions to help with subroutine calls, leading to explicit call instructions.

When an assembly language program executes a call, program flow jumps to another location, but the address of the next instruction (that is, the instruction that follows the call instruction in memory) is kept somewhere to use when returning. The IBM System/360 saved this address in a processor register, relying on convention to save and restore registers and return addresses in memory associated with individual subroutines, then using branches to the address specified in the register to accomplish a subroutine return. Another approach in early FORTRAN compilers in machines like the HP-2100 which did not support stacks was to reserve a location at or before the start of a subroutine to save the return location. This did not require a separate stack, but did not support recursion since there is only one return storage location per subroutine.

Most modern compilers use a push-down stacks. They support recursive subroutine calls since each call is given a fresh new location to store the return address. They were typically used in all but a few later architectures. In a stack based architecture, the return address is 'pushed' as a point of return on the stack. The most extreme "stack-based" architecture machines not only use a stack for return addresses, but even adding two operands. The subroutine exits by 'popping' a return value from the top of the stack, which reads the previously pushed return address and jumps to it, so that program flow continues immediately after the call instruction. Most RISC and VLIW architectures save the return address in a link register (as the IBM 360 did), but simulate a stack with load and store instructions rather than with push and pop instructions. The disadvantage of such as scheme is that the stack can overflow if recursion takes place at too many levels, or if the variables on each stack frame are too large. If there is not sufficient stack space, and there is no recursion, a tree-walk can be simulated with an iterative algorithm, storing return locations at each tree node, as was done with Lotus 1-2-3 and work-alike clone, The Twin which were based on PC's with very limited stack space.

[edit] Technical overview

A subprogram, as its name suggests, behaves in much the same way as a complete computer program, but on a smaller scale. Typically, the caller waits for subprograms to finish and continues execution only after a subprogram "returns". Subroutines are often given parameters to refine their behavior or to perform a certain computation with given values.

[edit] Side-effects

In most imperative programming languages, subprograms may have so-called side-effects; that is, they may cause changes that remain after the subprogram has returned. Usually, compilers cannot predict whether a subprogram has a side-effect or not, but can determine if a subprogram calls no other subprograms, or at least no other subprograms that have side-effects. In imperative programming, compilers usually assume every subprogram has a side-effect to avoid complex analysis of execution paths. Because of its side-effects, a subprogram may return different results each time it is called, even if it is called with the same arguments. A simple example is a subprogram that implements a pseudorandom number generator; that is, a subprogram that returns a random number each time it is called.

Such behavior is invalid in a strict mathematical sense. An exception to this common behaviour is found in pure functional programming languages such as Haskell, where subprograms can have no side effects, and will always return the same result if repeatedly called with the same arguments. Such languages typically have only functions, since subroutines that do not return a value are useless if they cannot have any other effect either. In functional programming writing to a file is also a side effect.

[edit] Low-level implementation

This section deals with the modern implementation of having subroutine data stored on one or more stacks.

Due to usage of a stack, a subroutine can call itself (see recursion) or other subroutines (nested calls), and of course it can call the same subroutine from several distinct places. Assembly languages generally do not provide programmers with such conveniences as local variables or subroutine parameters. They get to be implemented by passing values in registers or pushing them onto the stack (or another stack, if there is more than one).

When there is just one stack, the return addresses must be placed in the same space as the parameters and local variables. Hence, a typical stack may look like this (for a case where function1 calls function2):

  • Previous stack data
  • Function1 local variables
  • Parameters for function2
  • Function1 return address (of the instruction which called function2)
  • Function2 local variables

This is with a forwards-growing stack — on many architectures the stack grows backwards in memory. Forward and backwards-growing stacks are useful because it is quite practical to have two stacks growing towards each other in a common scratch space, using one mainly for control information like return addresses and loop counters and the other for data. (This is what Forth does.)

The parts of the program which are responsible for the entry into and exit out of the subroutine (and hence, the setting up and removal of each stack frame) are called the function prologue and epilogue.

If the procedure or function itself uses stack handling commands, outside of the prologue and epilogue, e.g. to store intermediate calculation values, the programmer needs to keep track of the number of 'push' and 'pop' instructions so as not to corrupt the original return address.

[edit] C and C++ examples

In the C and C++ programming languages, subprograms are referred to as "functions" (or "methods" when associated with a class). Note that these languages use the special keyword void to indicate that a function takes no parameters (especially in C) and/or does not return any value. Note that C/C++ functions can have side-effects, including modifying any variables whose addresses are passed as parameters (i.e. "passed by reference"). Examples:

 void function1(void) { }

The function above does absolutely nothing; it is called with "function1();"

 int function2(void)
   { return 5; }

This function returns the number 5, and can be called with "function2();"

 char function3 (int number)
   { char selection[] = {'S','M','T','W','T','F','S'}; 
     return selection[number];
   }

This function converts a number between 0 to 6 into the initial letter of the corresponding day of the week, namely 0 → 'S', 1 → 'M', ..., 6 → 'S'. It called with "function3(number);".

 void function4 (int* pointer_to_var)
   { (*pointer_to_var)++; }

This function does not return a value but modifies the variable whose address is passed as the parameter; it would be called with "function4(&variable_to_increment);".

[edit] Local variables, recursion and re-entrancy

A subprogram may find it useful to make use of a certain amount of "scratch" space; that is, memory used during the execution of that subprogram to hold intermediate results. Variables stored in this scratch space are referred to as local variables, and the scratch space itself is referred to as an activation record. An activation record typically has a return address that tells it where to pass control back to when the subprogram finishes.

A subprogram may have any number and nature of call sites. If recursion is supported, a subprogram may even call itself, causing its execution to suspend while another nested execution of the same subprogram occurs. Recursion is a useful technique for simplifying some complex algorithms, and breaking down complex problems. Recursive lanaguages generally provide a new copy of local variables on each call. If the programmer desires the value of local variables to stay the same between calls, they can be declared "static" in some languages, or global values or common areas can be used.

Early languages like Fortran did not initially support recursion because variables were statically allocated, as well as the location for the return address. Most computers before the late 1960s such as the PDP-8 did not have support for hardware stack registers.

Modern languages after ALGOL such as Pl/1 and C almost invariably use a stack, usually supported most modern computer instruction sets to provide a fresh activation record for every execution of a subprogram. That way, the nested execution is free to modify its local variables without concern for the effect on other suspended executions in progress. As nested calls accumulate, a call stack structure is formed, consisting of one activation record for each suspended subprogram. In fact, this stack structure is virtually ubiquitous, and so activation records are commonly referred to as stack frames.

If a subprogram can function properly even when called while another execution is already in progress, that subprogram is said to be re-entrant. A recursive subprogram must be re-entrant. Re-entrant subprograms are also useful in multi-threaded situations, since multiple threads can call the same subprogram without fear of interfering with each other.

In a multi-threaded environment, there is generally more than one stack. An environment which fully supports coroutines or lazy evaluation may use data structures other than stacks to store their activation records.

[edit] Overloading

Sometimes, it is desirable to have one function to be able to take in different series of parameters. When a function with the same name can accept different parameters, it is said to be overloaded. For example, a subroutine might construct an object that will accept directions, and trace its path to these points on screen. There are a plethora of parameters that could be passed in to the constructor (colour of the trace, starting x and y co-ordinates, trace speed). If the programmer wanted the constructor to be able to accept only the color parameter, then he could call another constructor that accepts only color, which in turn calls the constructor with all the parameters passing in a set of "default values" for all the other parameters (X and Y would generally be centered on screen or placed at the origin, and the speed would be set to another value of the coder's choosing).

[edit] Conventions

A number of conventions for the coding of subprograms have been developed. It has been commonly preferable that the name of a subprogram should be a verb when it does a certain task, an adjective when it makes some inquiry, and a noun when it is used to substitute variables and such.

Experienced programmers recommend that a subprogram perform only one task. If a subprogram performs more than one task, it should be split up into more subprograms. They argue that subprograms are key components in maintaining code and their roles in the program must be distinct.

Some advocate that each subprogram should have minimal dependency on other pieces of code. For example, they see the use of global variables as unwise because it adds tight-coupling between subprograms and global variables. If such coupling is not necessary at all, they advise to refactor subprograms to take parameters instead. This practice is controversial because it tends to increase the number of passed parameters to subprograms.

See programming practice for a more detailed discussion of programming disciplines.

[edit] Related terms and clarification

Different programming languages and methodologies possess notions and mechanisms related to subprograms:

  • Subroutine is practically synonymous with "subprogram." The former term may derive from the terminology of assembly languages and Fortran.
  • Function and Procedure are also synonymous with "subprogram" - with the distinction (in some programming languages) that "functions", generate return values and appear in expressions, where "procedures", generate no return values and appear in statements. Hence, a subprogram that calculates the square root of a number would be a "function" (eg: y = sqrt(x);) where a subprogram to print out a number might be a "procedure" (eg: print(x);). This is not a distinction found in all programming languages and notably the C family of programming languages use the two terms interchangeably. See also: Command-Query Separation.
  • Predicate is, in general, a boolean-valued function (a function that returns a boolean). In logic programming languages, often all subroutines are called "predicates", since they primarily determine success or failure.
  • Method or Member function is a special kind of subprogram used in object-oriented programming that describes some behaviour of an object.
  • Closure is a subprogram together with the values of some of its variables captured from the environment in which it was created.
  • Coroutine is a subprogram that returns to its caller before completing.
  • Event handler, or simply "handler," is a subprogram that is called in response to an "event", such as a computer user moving the mouse or typing on the keyboard. The AppleScript scripting language simply uses the term "handler" as a synonym for subprogram. Event handlers are often used to respond to an Interrupt - in which case they may be termed an Interrupt handler.
  • Threaded code makes code even more compact. It uses a small interpreter to execute subroutines that consist of lists of subroutine addresses. The lowest levels of subroutine are the only machine language.

[edit] See also

[edit] References

  • Wilkes, M. V., Wheeler, D. J., Gill, S. (1951). Preparation of Programs for an Electronic Digital Computer. Addison-Wesley.
  • Donald Knuth. Fundamental Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89683-4. Section 1.4.1: Subroutines, pp.186–193.bs:Podrutina

cs:Funkce (programování) da:Funktion (programmering) de:Funktion (Programmierung) et:Funktsioon (programmeerimine) es:Subrutina fa:زیرروال fr:Fonction (informatique) is:Undirforrit it:Funzione (informatica) he:פונקציה (תכנות) nl:Subprogramma ja:関数 (プログラミング) pl:Podprogram pt:Subrotina ru:Функция (программирование) fi:Aliohjelma sv:Subrutin vi:Chương trình con uk:Підпрограма zh:子程序

Personal tools