Francais | English | Espanõl

Endianness

From Wikipedia, the free encyclopedia

(Redirected from Endian)
Jump to: navigation, search

In computing, endianness is the ordering used to represent some kind of data as a sequence of smaller units. Typical cases are the orderings of the bytes of a representation of an integer in a computer's memory (relatively to a given memory addressing scheme) and the transmission order over a network or other medium. When specifically talking about bytes in computing, endianness is also referred to simply as byte order (or, less commonly, as byte sex).<ref>It is unclear whether this expression is also used when more than two orderings are possible. It is reported by the Jargon file: byte sex. Similarly, the manual for the ORCA/M assembler refers to a field indicating the order of the bytes in a number field as NUMSEX.</ref>

Contents

[edit] Endianness as a general concept

This common term does not come from computer data but from Swift's Gulliver's Travels. Swift mocked people who argued over which end of a boiled egg to eat first.

Generally speaking, endianness is a particular facet of a representation format. As such, it applies to the representation(s) of integers used by computer processors, to encoding schemes such as UTF-16 and UTF-32 or to the conceptual encoding implied by some low-level algorithms (see for instance MD5 and SHA hash functions). Likewise it applies to network transmissions, where it is established by the employed protocol.

[edit] LOLA: LowOrderLo#Address

In the 1970s micro controllers were 8 bit and had to string Bytes in memory to get useable sized values, usually 2 Bytes. Since memory addresses are (+) incremented screen display went from Hex 0000 to FFFF with the low number addresses at the TOP of a page, starting at LEFT.

Intel stored the low-order (16s & units) Byte in memory, incremented the address and stored the high-order Byte. From Left to Right the "little", low order Byte was stored first.

E.g., decimal 64,000= hex "FA00" was stored low to high as:

Adrs.: 0000_0001_0002_....____FFFF.

Data : _00___FA___--___--______-- .

The bits are NOT reversed, just the Bytes! On screen this appeared to reverse the Bytes. The term was "iNTEL used REVERSE order", despite both being (+) incremented.


Motorola stored Bytes oppositely. The most-significant, high-order Byte was stored at the low # address; then the next higher (+1) address got the low-order Byte. On screen this appeared to be in normal left to right math order. The term was "Motorola used FORWARD order", despite having reversed storage order. From left to right the "big" end came first. Endian is for Gulliver's gullible.

A better acronym/mnemonic would be Intel "LOLA", Moto "LOHA"

Intel, "Low-Order at Low# Address" vs. Moto, "Low-Order at High# Address"
                  OR
Intel, "Little-Order at Low# Address" vs. Moto, "Little-Order at High# Address"


A note on some non-idiomatic usages: some authors extend the usage of the word "endianness" and of related terms to entities such as mail addresses, date formats and others. It should be noticed however that such usages —basically reducing endianness to a mere synonym of ordering— are most often the writer's choice and do not reflect official terminology as ratified by the corresponding standards (for instance, ISO 8601:2004 talks about "descending order year-month-day", not about "big-endian format").

[edit] Endianness and hardware

Most computer processors simply store integers as sequences of bytes, so that, conceptually, the encoded value can be obtained by simple concatenation. For an <math>n</math>-byte integer value this allows <math>n!</math> (<math>n</math> factorial) possible representations (one for each byte permutation). The two most common of them are a) increasing numeric significance with increasing memory addresses, known as little-endian, and b) its opposite, called big-endian. "Endian" is for Gulliver's gullible. <ref>Note how, in these expressions, the term "end" is to be intended as "extremity", not as "last part"; big and little say which extremity is written first. </ref>

6502 processors (and their derivatives) as well as Intel x86 processors (and their clones), use the little-endian format (for this reason also called Intel format). Networks generally use big-endian; the reason, historical, is that this allowed routing while a telephone number was being composed.

Motorola processors have generally used big-endian. SPARC, PowerPC (which includes Apple's Macintosh line prior to the Intel switch) and System/370 also adopt big-endian.

[edit] Bi-endian hardware

Some architectures —including ARM, PowerPC (but not the PPC970/G5), DEC Alpha, MIPS, PA-RISC and IA64— feature switchable endianness. That can improve performance or simplify the logic of networking devices and software. The word bi-endian, said of hardware, denotes the capability to compute or pass data in either of two different endian formats (usually big-endian and little-endian)

Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some systems the default endianness is selected by hardware on the motherboard and cannot be changed via software (e.g., the DEC Alpha, which runs only in big-endian mode on the Cray T3E).

Note, too, that some nominally bi-endian CPUs may actually employ internal magic (as opposed to really switching to a different endianness) in one of their operating modes. For instance, some PowerPC processors in little-endian mode act as little-endian from the point of view of the executing programs but they do not actually store data in memory in little-endian format (multi-byte values are swapped during memory load/store operations). This can cause problems when memory is transferred to an external device if some part of the software, e.g. a device driver, does not account for the situation.

[edit] Discussion, background, etymology

The choice of big-endian vs. little endian has been the subject of flame wars. The very term big-endian comes from Jonathan Swift's satiric novel Gulliver’s Travels, where tensions are described in Lilliput and Blefuscu because a faction called the Big-endians prefer to crack open their soft-boiled eggs from the big end, contrary to Lilliputian royal edict.<ref>Gulliver's Travels, Part I, Chapter IV</ref> The terms little-endian and endianness have a similar ironic intent.<ref>Endian FAQ – Including the paper On Holy Wars and a Plea for Peace by Danny Cohen, 1 April 1980</ref>

A person's preference is usually based on which convention was studied first, on his/her general background and other aspects. Nevertheless, they give the same effect. An often cited argument in favour of big-endian is that it is consistent with the ordering used in natural languages. But that is far from being universal, both in spoken and written form:

  • spoken: though most spoken languages express most numbers, especially those larger than a hundred, in a "big-endian manner"<ref>Cf. entries #539 and #704 of the Linguistic Universals Database</ref> (in English, for example, one says "twenty-four", not "four-and-twenty") there are notable exceptions such as the German and the Dutch languages, which use "little-endian" for numbers between 21 and 99 and "mixed endianness" for larger numbers (e.g. vierundzwanzig/vierentwintig (24, literally "four-and-twenty"), and hundertvierundzwanzig (124, literally "hundred four-and-twenty")
  • written: the Hindu-Arabic numeral system is used worldwide and is such that the most significant digits are always written to the left of the less significant ones. Writing left to right, this system is therefore "big-endian". Writing right to left, this numeral system is "little-endian". It is worth noting, also, that in quite a few languages the spoken order of numerals is inconsistent with how they appear written; and that in some languages, such as Hebrew, it is common to interrupt the writing of text (right-to-left) to write a number in the opposite order (left-to-right)

Little-endian has the property that, in the absence of alignment restrictions, variables can be read from memory at different widths without adjustment of their initial address. For example, a 32 bit variable laid out as 4A 00 00 00 can be read at the same address as either 8 bit (value = 4A), 16 bit (004A), or 32 bit (0000004A). This doesn't imply, however, that little-endian has in itself a performance advantage in variable-width data access, as the speed of a system depends on its general architecture, the operating system it is running and a very high number of other factors.

[edit] Examples

All the examples refer to the storage in memory of the value 0a0b0c0d.

For example, consider the number 1025 (2 to the tenth power plus one) stored in a 4-byte integer:

00000000 00000000 00000100 00000001

Address Big-Endian representation of 1025 Little-Endian representation of 1025
00 00000000 00000001
01 00000000 00000100
02 00000100 00000000
03 00000001 00000000


Note: all numerical values in this section from this point forward are shown in code style are in hexadecimal.

To further illustrate the above notions this section provides example layouts of a 32-bit number in the most common variants of endianness. There is no general guarantee that a platform will use one of these formats but in practice there are few if any exceptions.

[edit] Big-endian

  • With 8-bit atomic element size and 1-byte (octet) address increment:
increasing addresses  →
... 0a 0b 0c 0d ...

The most significant byte (MSB) value, which is 0a in our example, is stored at the memory location with the lowest address, the next byte value in significance, 0b, is stored at the following memory location and so on.

  • With 16-bit atomic element size:
increasing addresses  →
... 0a0b 0c0d ...

The most significant atomic element stores now the value 0a0b (in an unspecified format), followed by 0c0d.

[edit] Little-endian

  • With 8-bit atomic element size and 1-byte (octet) address increment:
increasing addresses  →
... 0d 0c 0b 0a ...

The least significant byte (LSB) value, 0d, is at the lowest address. The other bytes follow in increasing order of significance.

  • With 16-bit atomic element size:
increasing addresses  →
... 0c0d 0a0b ...

The least significant 16-bit unit stores the value 0c0d (in an internal format), immediately followed by 0a0b.

[edit] Middle-endian

Still other architectures, generically called middle-endian or mixed-endian, may have a more complicated ordering; PDP-11, for instance, stored 32-bit words, counting from the most significant, as: 2nd byte first, then 1st, then 4th, and finally 3rd.

  • storage of a 32-bit word on a PDP-11
increasing addresses  →
... 0b 0a 0d 0c ...

Note that this can be interpreted as storing the most significant "half" (16-bits) followed by the less significant half (as if big-endian) but with each half stored in little-endian format. This ordering is known as PDP-endianness.

[edit] Endianness in networking

While the lowest network protocols may deal with sub-byte formatting, all the layers above them usually consider the byte (mostly intended as octet) as their atomic unit. The Internet Protocol defines a standard "big-endian" network byte order. This byte order is used for all numeric values in the packet headers and by many higher level protocols and file formats that are designed for use over IP. The Berkeley sockets API defines a set of functions to convert 16- and 32-bit integers to and from network byte order: the htonl (host-to-network-long) and htons (host-to-network-short) functions convert 32-bit and 16-bit values respectively from machine (host) to network order; whereas the ntohl and ntohs functions convert from network to host order.

[edit] "Bit endianness"

The terms bit endianness or bit-level endianness are seldom used when talking about the representation of a stored value, as they are only meaningful for the rare computer architectures which support addressing of individual bits. For bit access within a byte, most if not all architectures define the Lowest Significant Bit as bit 0, and thus could be said to use little-endian bit ordering. The terms are significant however when referring to the transmission order of a bit serial medium. Most often that order is transparently managed by the hardware, and sometimes it is configurable. The most common is the bit-level analogous of little-endian (low-bit first), although there are many that send high-bit first, such as I²C. Since Internet protocols require big-endian for bytes, bits are often transmitted over the wire low-bit first but high-byte first: when sending e.g. a 32 bit integer the first bit on the wire is the bit at position 24, followed in order by the ones at the positions 25, 26…31, then 16–23, 8–15 and finally 0–7. The decision about the order of transmission of bits is made in the very bottom of the data link layer of the OSI model.

[edit] Notes

<references/>

[edit] External links


This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.cs:Endianita de:Byte-Reihenfolge es:Endianness fr:Endianness ko:엔디언 it:Ordine dei byte lt:Baitų seka žodyje nl:Endianness ja:エンディアン pl:Kolejność bajtów pt:Little endian ru:Порядок байтов sr:Big endian fi:Tavujärjestys sv:Endian tr:Endian uk:Endianness

Personal tools