Associative array
From Wikipedia, the free encyclopedia
An associative array (also map, hash, dictionary, finite map, lookup table, and in query-processing an index or index file) is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key "bob" is 7, we say that our array maps "bob" to 7. Associative arrays are very closely related to the mathematical concept of a function with a finite domain. As a consequence, a common and important use of associative arrays is in memoization.
From the perspective of a programmer using an associative array, it can be viewed as a generalization of an array: While a regular array maps integers to arbitrarily typed objects (integers, strings, pointers, and, in an OO sense, objects), an associative array maps arbitrarily typed objects to arbitrarily typed objects. (Implementations of the two data structures, though, may be considerably different.)
The operations that are usually defined for an associative array are:
- Add: Bind a new key to a new value
- Reassign: Bind an old key to a new value
- Remove: Unbind a key from a value and remove it from the key set
- Lookup: Find the value (if any) that is bound to a key
Contents |
[edit] Examples
One can think of a telephone book as an example of an associative array, where names are the keys and phone numbers are the values. Another example would be a dictionary where words are the keys and definitions are the values. A database is a sort of generalized associative array.
[edit] Data structures for associative arrays
Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).
[edit] Efficient representations
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:
- Hash tables have faster average lookup and insertion time (O(1)), while some kinds of balanced binary trees have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in realtime systems, but trees can be useful in high-security realtime systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
- Hash tables can have more compact storage for small value types, especially when the values are bits.
- There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
- Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
- Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
- Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range.
[edit] Association lists
A simple but generally inefficient type of associative array is an association list, often called an alist for short, which simply stores a linked list of key-value pairs. Each lookup does a linear search through the list looking for a key match.
Strong advantages of association lists include:
- No knowledge is needed about the keys, such as an order or a hash function.
- For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
- Insertions are done in constant time by consing the new association to the head of the list.
[edit] Specialized representations
If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables.
String-keyed maps can avoid extra comparisons during lookups by using tries.
[edit] Multimap
A variation of the map (associative array) is the multimap, which is the same as map data structures, but allows a key to be mapped to more than one value. Formally, a multimap can be thought of as a regular associative array that maps unique keys to nonempty multisets of values, although actual implementation may vary. C++'s Standard Template Library provides the "multimap" class for the sorted multiset, and SGI's STL provides the "hash_multimap" class, which implements a multimap using a hash table.
[edit] Language support
Associative arrays are known by many names in different programming languages. In Smalltalk, Objective-C and Python they are called dictionaries; in Perl and Ruby they are called hashes; in Java they are called Maps (see Map) and in Common Lisp they are called hash tables (since Common Lisp typically uses this implementation). In PHP all arrays can be associative, except that the keys are limited to integers and strings and can only be a single level of subscripts. In MUMPS, all arrays are associative, with multiple dimensional arrays (ie: with variable numbers of keys), MUMPS keys are strings with values at any dimensional level allowed to store string values.
In the scripting language Lua, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays. Likewise, in JavaScript, all objects are associative arrays. In MUMPS, the associative arrays are typically stored as B-trees.
Associative arrays can be implemented in any programming language, and one or more implementations of them is typically found either built into the language or the standard library distributed with it (C is a notable exception, as neither the language nor the standard library directly provide one).
[edit] Awk
Awk has built-in, language-level support for associative arrays.
For example:
phonebook["[[Sally Smart]]"] = "555-9999" phonebook["[[John Doe]]"] = "555-1212" phonebook["[[J. Random Hacker]]"] = "555-1337"
You can also loop through an associated array as follows:
for (name in phonebook)
{
print name, " ", phonebook[name]
}
You can also check if an element is in the associative array, and delete elements from an associative array.
Multi-dimensional associative arrays can be implemented in standard Awk using concatenation and e.g. SUBSEP:
{ # for every input line
multi[$1 SUBSEP $2]++;
}
#
END {
for (x in multi) {
split(x, arr, SUBSEP);
print arr[1], arr[2], multi[x];
}
}
Thompson AWK [1] provides built-in multi-dimensional associative arrays:
{ # for every input line
multi[$1][$2]++;
}
#
END {
for (x in multi)
for (y in multi[x])
print x, y, multi[x][y];
}
[edit] C
There is no standard implementation of an associative array in C, but a 3rd party library with BSD license is available here. POSIX 1003.1-2001 describes the functions hcreate(), hdestroy() and hsearch().
[edit] C#
Dim ht As New Hashtable()
ht.Add("testKey", "AssociatedData")
MessageBox.Show(CStr(ht("testKey")))
Hashtable ht = new Hashtable();
ht.Add("testKey", "AssociatedData");
MessageBox.Show((string) ht["testKey"]);
Dictionary<int, string> dic = new Dictionary<int, string>(); dic.Add(7, "Bond"); MessageBox.Show(dic[7]);
In the above sample the Hashtable class is only capable of associating a String key with a value of Object type. Because in .NET all types (save pointers) ultimately derive form Object anything can be put into a Hashtable, even data of different types. This could lead to errors if consuming code expects data to be of a singular type. In the above code casting is required to convert the Object variables back to their original type. Additionally, casting value-types (structures such as integers) to Object to put into the Hashtable and casting them back requires boxing/unboxing which incurs both a slight performance penalty and pollutes the heap with garbage. This changes in C# 2.0 with generic hashtables called dictionaries. There are significant performance and reliability gains to these strongly typed collections because they do not require boxing/unboxing or explicit type casts and introduce compile-time type checks.
[edit] C++
C++ also has a form of associative array called std::map (see Standard Template Library#Containers). One could create a map with the same information as above using C++ with the following code:
#include <map>
#include <string>
int main()
{
std::map<std::string, std::string> phone_book;
phone_book["Sally Smart"] = "555-9999";
phone_book["John Doe"] = "555-1212";
phone_book["J. Random Hacker"] = "553-1337";
return 0;
}
In C++, the std::map class is templated which allows keys and values to be different data types. For a given instance of the map class the keys must be of the same base type. The same must be true for all of the values. Although std::map is typically implemented using a self-balancing binary search tree, the SGI STL also provides a std::hash_map which has the algorithmic benefits of a hash table.
[edit] Cocoa/GNUstep (Objective-C)
Cocoa (API) and GNUstep handle associative arrays using NSMutableDictionary (a mutable version of NSDictionary) class cluster. This class allows assignments between any two objects to be made. A copy of key object is made before it is being inserted into NSMutableDictionary, therefore the keys must conform to the NSCopying protocol. When being inserted to a dictionary, the value object receives a retain message to increase its reference count. The value object will receive the release message when it will be deleted from the dictionary (both explicitly or by adding to the dictionary a different object with the same key).
NSMutableDictionary *aDictionary = [[NSMutableDictionary alloc] init]; [aDictionary setObject:@"555-9999" forKey:@"Sally Smart"]; [aDictionary setObject:@"555-1212" forKey:@"John Doe"]; [aDictionary setObject:@"553-1337" forKey:@"Random Hacker"];
To access assigned objects this command may be used:
id anObject = [aDictionary objectForKey:@"Sally Smart"];
All keys or values can be simply enumerated using NSEnumerator
NSEnumerator *keyEnumerator = [[aDictionary allKeys] objectEnumerator];
id key;
while (( key = [keyEnumerator nextObject] ))
{
// ... process it here ...
}
What is even more practical, structured data graphs may be easily created using Cocoa (API), especially NSDictionary (NSMutableDictionary). This can be ilustrated with this compact example:
NSDictionary *aDictionary =
[NSDictionary dictionaryWithObjectsAndKeys:
[NSDictionary dictionaryWithObjectsAndKeys:
@"555-9999", @"Sally Smart",
@"555-1212", @"John Doe",
nil], @"students",
[NSDictionary dictionaryWithObjectsAndKeys:
@"553-1337", @"Random Hacker",
nil], @"hackers",
nil];
And relevant fields can be quickly accessed using key paths:
id anObject = [aDictionary valueForKeyPath:@"students.Sally Smart"];
[edit] D
D offers direct support for associative arrays in the core language - they are implemented as hash tables. The equivalent example would be:
int main()
{
char[][char[]] phone_book;
phone_book["Sally Smart"] = "555-9999";
phone_book["John Doe"] = "555-1212";
phone_book["J. Random Hacker"] = "553-1337";
return 0;
}
Keys and values can be any types, but all the keys in an associative array must be of the same type, and the same for values.
You can also loop through all properties and associated values, i.e. as follows:
foreach (key, value; phone_book) {
writefln("Number for $name: $number\n", key, value);
}
A property can be removed as follows:
phone_book.remove("Sally Smart");
[edit] Java
In Java associative arrays are implemented as "maps"; they are part of the Java Collections Framework. Since J2SE 5.0 and the introduction of generics into Java, collections can have a type specified; for example, an associative array mapping strings to strings might be specified as follows:
Map<String, String> phoneBook = new HashMap<String, String>();
phoneBook.put("Sally Smart", "555-9999");
phoneBook.put("John Doe", "555-1212");
phoneBook.put("J. Random Hacker", "555-1337");
The get method is used to access a key; for example, the value of the expression phoneBook.get("Sally Smart") is "555-9999".
This code uses a hash map to store the associative array, by calling the constructor of the HashMap class; however, since the code only uses methods common to the interface Map, one could also use a self-balancing binary tree by calling the constructor of the TreeMap class (which implements the subinterface SortedMap, without changing the definition of the phone_book variable or the rest of the code, or use a number of other underlying data structures that implement the Map interface.
The hash function in Java is provided by the method Object.hashCode(). Since every class in Java inherits from Object, every object has a hash function. A class can override the default implementation of hashCode() to provide a custom hash function based on the properties of the object.
The Object class also contains the method equals(Object) that tests the object for equality with another object. Maps in Java rely on objects maintaining the following contract between their hashCode() and equals() methods:
- For two objects a and b
- a.equals(b) == b.equals(a)
- if a.equals(b) then a.hashCode() == b.hashCode()
In order to maintain this contract, a class that overrides equals() must also override hashCode(), and vice versa, so that hashCode() is based on the same properties (or a subset of the properties) as equals().
A further contract that the map has with the object is that the results of the hashCode() and equals() methods will not change once the object has been inserted into the map. For this reason, it is generally a good practice to base the hash function on immutable properties of the object.
[edit] JavaScript
JavaScript (and its standardized version: ECMAScript) is a prototype-based object-oriented language. In JavaScript an object is a mapping from property names to values -- that is, an associative array. The only difference is that objects have a prototype link to the object they inherit from. Doing a lookup for a property will forward the lookup to the prototype if the object does not define the property itself.
An object literal is written as { property1 : value1, property2 : value2, ... }. For example:
var myObject = { "Sally Smart" : "555-9999",
"John Doe" : "555-1212",
"J. Random Hacker" : "553-1337" };
If the property name is a valid identifier, the quotes can be omitted, e.g.:
var myOtherObject = { foo : 42, bar : false }
Lookup is written using property access notation, either square brackets, which always works, or dot notation, which only works for identifier keys:
myObject["John Doe"] myOtherObject.foo
You can also loop through all properties and associated values as follows:
for(var property in myObject) {
var value = myObject[property];
alert("myObject[" + property + "] = " + value);
}
A property can be removed as follows:
delete myObject["Sally Smart"];
Any object, including built-in objects such as Array, can be dynamically extended with new properties. For example:
Array.prototype.removeAllObjects = function () {
/* ... */
}
In modern JavaScript it's considered bad form to use the Array type as an associative array. Consensus is that the Object type is best for this purpose. The reasoning behind this is that if Array is extended via prototype and Object is kept pristine, 'for(in)' loops will work as expected on associative 'arrays'. This issue has been drawn into focus by the popularity of JavaScript frameworks that make heavy and sometimes indiscriminate use of prototype to extend JavaScript's inbuilt types.
See JavaScript Array And Object Prototype Awareness Day for more information on the issue.
[edit] KornShell 93 (and compliant shells: ksh93, bash, zsh, ...)
- Note: The latest version of bash, 3.2, doesn't support associative arrays properly yet.
Definition:
typeset -A phonebook; phonebook=(["Sally Smart"]="555-9999" ["John Doe"]="555-1212" ["J. Random Hacker"]="555-1337");
Dereference:
${phonebook["John Doe"]};
[edit] Lisp
Lisp was originally conceived as a "LISt Processing" language, and one of its most important data types is the linked list, which can be treated as an association list ("alist").
'(("Sally Smart" . "555-9999")
("John Doe" . "555-1212")
("J. Random Hacker" . "553-1337"))
The syntax (x . y) is used to indicate a consed pair. Keys and values need not be the same type within an alist. Lisp and Scheme provide operators such as assoc to manipulate alists in ways similar to associative arrays.
Because of their linear nature, alists are used for relatively small sets of data. Common Lisp also supports a hash table data type, and for Scheme they are implemented in SRFI 69. Hash tables have greater overhead than alists, but provide much faster access when there are many elements.
It is easy to construct composite abstract data types in Lisp, using the object-oriented programming features in conjunction with lists, arrays, and hash tables.
[edit] Lua
In Lua, table is a fundamental type that can be used either as array (numerical index, fast) or as associative array. The keys and values can be of any type, except nil.
A table literal is written as { value, key = value, [index] = value, ["non id string"] = value }. For example:
phone_book = {
["Sally Smart"] = "555-9999",
["John Doe"] = "555-1212",
["J. Random Hacker"] = "553-1337", -- Trailing comma is OK
}
aTable = {
"one", -- Goes to slot 1 (no slot 0)
[3] = [[C:\tmp\]], -- Jumps to slot 3
subTable = { 5, 7.5, k = true }, -- key is "subTable"
['John Doe'] = function (age) if age < 18 then return "Young" else return "Old!" end end,
}
If the key is a valid identifier (not a keyword), the quotes can be omitted.
Lookup is written using either square brackets, which always works, or dot notation, which only works for identifier keys:
aTable[1] aTable["John Doe"](45) aTable.subTable.k
You can also loop through all keys and associated values with iterators or for loops:
simple = { 1, [5] = "f", 3.14, x = 'x', ["!"] = 42 }
-- Iterate on all keys
table.foreach(simple, function (k, v) print("[" .. k .. "] = " .. v) end)
for k, v in pairs(simple) do print("[" .. k .. "] = " .. v) end
k= nil
repeat
k, v = next(simple, k)
if k ~= nil then print("[" .. k .. "] = " .. v) end
until k == nil
-- Iterate on the array part (numerical indices, stops on 2)
table.foreachi(simple, function (k, v) print("[" .. k .. "] = " .. v) end)
for k, v in ipairs(simple) do print("[" .. k .. "] = " .. v) end
for i = 1, table.getn(simple) do print("[" .. i .. "] = " .. simple[i]) end
table.getn deprecated in Lua 5.1 in favor of # operator
An entry can be removed as follows:
simple[3] = nil -- Leaves a gap table.remove(simple, 3) -- Entries are shifted. By default removes last entry.
Likewise, you can overwrite values or insert them:
simple[3] = "tree" table.insert(simple, 3, "tree") -- Entries are shifted. By default adds to end of table.
Other table operations: table.sort (in-place sort), table.concat (table to string), table.getn and table.setn (both deprecated in Lua 5.1).
[edit] MUMPS
In MUMPS every array is an associative array. The built-in, language-level, direct support for associative arrays applies to private, process-specific arrays stored in memory called "locals" as well as to the permanent, shared arrays stored on disk which are available concurrently by multiple jobs. The name for globals is preceded by the circumflex "^" to distinquish it from local variable names.
SET ^phonebook("Sally Smart")="555-9999" ;; storing permanent data
SET phonebook("John Doe")="555-1212" ;; storing temporary data
SET phonebook("J. Random Hacker")="553-1337" ;;storing temporary data
MERGE ^phonebook=phonebook ;;copying temporary data into permanent data
To access the value of an element, simply requires using the name with the subscript:
WRITE "Phone Number :",^phonebook("Sally Smart"),!
You can also loop through an associated array as follows:
SET NAME="" FOR S NAME=$ORDER(^phonebook(NAME)) QUIT:NAME="" . WRITE NAME," Phone Number :",^phonebook(NAME),!
[edit] OCaml
The OCaml programming language provides three different associative containers. The simplest is a list of pairs:
# let m = ["Sally Smart", "555-9999";
"John Doe", "555-1212";
"J. Random Hacker", "553-1337"];;
val m : (string * string) list =
[("Sally Smart", "555-9999"); ("John Doe", "555-1212");
("J. Random Hacker", "553-1337")]
# List.assoc "John Doe" m;;
- : string = "555-1212"
The second is a polymorphic hash table:
# let m = Hashtbl.create 3;;
val m : ('_a, '_b) Hashtbl.t = <abstr>
# Hashtbl.add m "Sally Smart" "555-9999";
Hashtbl.add m "John Doe" "555-1212";
Hashtbl.add m "J. Random Hacker" "553-1337";;
- : unit = ()
# Hashtbl.find m "John Doe";;
- : string = "555-1212"
Finally, functional maps (represented as immutable balanced binary trees):
# include (Map.Make(String));; ... # let m = add "Sally Smart" "555-9999" empty let m = add "John Doe" "555-1212" m let m = add "J. Random Hacker" "553-1337" m;; val m : string t = <abstr> # find "John Doe" m;; - : string = "555-1212"
Lists of pairs and functional maps both provide a purely functional interface. In contrast, hash tables provide an imperative interface. For many operations, hash tables are significantly faster than lists of pairs and functional maps.
[edit] Perl
Perl has built-in, language-level support for associative arrays. Modern Perl vernacular refers to associative arrays as hashes; the term associative array is found in older documentation, but is considered somewhat archaic. Perl hashes are flat: keys are strings and values are scalars. However, values may be references to arrays or other hashes.
A hash variable is marked by a % sigil, to distinguish it from scalar, array and other data types. A hash can be initialized from a key-value list:
%phone_book = ("Sally Smart", "555-9999",
"John Doe", "555-1212",
"J. Random Hacker", "553-1337");
Perl offers the => syntax, semantically (almost) equivalent to the comma, to make the key-value association more visible:
%phone_book = ("Sally Smart" => "555-9999",
"John Doe" => "555-1212",
"J. Random Hacker" => "553-1337");
Accessing a hash element uses the syntax $hash_name{$key} - the key is surrounded by curly braces and the hash name is prefixed by a $, indicating that the hash element itself is a scalar value, even though it is part of a hash. The value of $phone_book{"John Doe"} is "555-1212". The % sigil is only used when referring to the hash as a whole, such as when asking for keys %phone_book.
The list of keys and values can be extracted using the built-in functions keys and values, respectively. So, for example, to print all the keys of a hash:
foreach $name (keys %phone_book) {
print "$name\n";
}
One can iterate through (key, value) pairs using the each function:
while (($name, $number) = each %phone_book) {
print "Number for $name: $number\n";
}
[edit] PHP
PHP also has built-in support for Associative Arrays. In fact, this is the major kind of array used in PHP programming. An associative array can be formed in one of two ways:
$phonebook = array (); $phonebook['Sally Smart'] = '555-9999'; $phonebook['John Doe'] = '555-1212'; $phonebook['J. Random Hacker'] = '555-1337';
$phonebook = array ( 'Sally Smart' => '555-9999', 'John Doe' => '555-1212', 'J. Random Hacker' => '555-1337', // trailing comma is OK );
You can also loop through an associative array as follows:
foreach ($phonebook as $name => $number) {
echo "Number for $name: $number\n";
}
PHP has an extensive set of functions to operate on arrays.
[edit] Python
In Python, associative arrays are called dictionaries. Dictionary literals are marked with curly braces:
phonebook = {'Sally Smart' : '555-9999',
'John Doe' : '555-1212',
'J. Random Hacker' : '553-1337'}
To access an entry in python simply use the array indexing operator. For example, the expression phonebook['Sally Smart'] would return '555-9999'.
An example loop iterating through all the keys of the dictionary:
for key in phonebook:
print key, phonebook[key]
Iterating through (key, value) tuples:
for key, value in phonebook.iteritems():
print key, value
Dictionaries can also be constructed with the dict builtin, which takes a key-value list:
dict((key, value) for key, value in phonebook.items() if 'J' in key)
Dictionary keys can be individually deleted using the del statement. The corresponding value can be returned before the key-value pair are deleted using the pop method of dict types;
del phonebook['John Doe']
val = phonebook.pop('Sally Smart')
assert phonebook.keys() == ['J. Random Hacker'] # Only one key left
[edit] Ruby
In Ruby a Hash is used as follows:
phonebook = {
'Sally Smart' => '555-9999',
'John Doe' => '555-1212',
'J. Random Hacker' => '553-1337'
}
phonebook['John Doe'] produces '555-1212'
To iterate over the hash, use something like the following:
phonebook.each {|key, value| puts key + " => " + value}
Additionally, each key may be shown individually:
phonebook.each_key {|key| puts key}
[edit] Smalltalk
In Smalltalk a dictionary is used:
phonebook := Dictionary new. phonebook at: 'Sally Smart' put: '555-9999'. phonebook at: 'John Doe' put: '555-1212'. phonebook at: 'J. Random Hacker' put: '553-1337'.
To access an entry the message #at: is sent to the dictionary object.
phonebook at: 'Sally Smart'
gives
'555-9999'
[edit] TCL
In Tcl every array is an associative array.
set "phonebook(Sally Smart)" 555-9999 set "phonebook(John Doe)" 555-1212 set "phonebook(J. Random Hacker)" 553-1337
The first argument of the set command has to be enclosed by double quotes, as it contains a space. In Tcl, space is used to separate arguments. Alternatively, array setting can be grouped into a single command (keys braced because they contain whitespace):
array set phonebook {
{Sally Smart} 555-9999
{John Doe} 555-1212
{J. Random Hacker} 553-1337
}
To access an entry and put it on standard output
puts "$phonebook(Sally Smart)"
The result is here
555-9999
[edit] External links
- TreeDictionary<K,V> Implementation in C#
- NIST's Dictionary of Algorithms and Data Structures: Associative Array
- NIST's Dictionary of Algorithms and Data Structures: Association List
- SGI STL page on Associative Containers
- SGI STL page on std::map
- SGI STL page on std::hash_mapcs:Asociativní pole
de:Assoziatives Array ja:連想配列 nl:Associatieve array pl:Tablica asocjacyjna ru:Ассоциативный массив

