Big or Little Endian

From iGeek
Jump to: navigation, search
Gulliverstravels.jpg
What is Endian? How do you like your eggs? Big or little end up? If there are two equally valid ways to do something, then odds are that two different companies will chose to do those things differently. This is Murphy's law in action -- and it applied to different chip designers and how they ordered data in memory.

If you don't understand the basics of Binary and what this means, read Binary, OCTal, HEXadecimal -- it will help substantially with this article.

Fables

In Gulliver's Travels the Lilliputians were very small, and had correspondingly small political problems. The Big-Endian and Little-Endian parties debated over which end a soft-boiled eggs should be opened at (the big end or the little end).

On April 1, 1980 (ironic date), Danny Cohen wrote a now famous paper called "On Holy Wars and a Plea for Peace", about byte ordering in words -- and applied the term "Endian" to this problem. The term immediately stuck. So endian just means which end is which.

The term "Byte Sex" is also used. Unix programmers also call this the "NUXI" problem -- since if you get the byte ordering wrong the word 'UNIX' will be scrambled and come out as 'NUXI'. But I'm getting ahead of myself -- read on and I'll explain more.

Binary Endian

Computer memory is a large array of bits (switches that are either 0 or 1). These bits are grouped into Bytes (groups of 8 bits). Then groups of Bytes are grouped into words (16 bits), long words (32 bits), quad words (64 bits), and larger groups.

The problem with any grouping is, "which end is the most significant (larger) end?"

Imagine I am encoding a value of 12 (decimal) into binary. That encodes to a 1 in the 8's column, a 1 in the 4's column, a 0 in the 2's column, and a 0 in the 1's column (1100 B). Added up, that equals 12. The 8's column is the most significant bit (MSB), since it holds the largest value; and the 1's column is the least significant bit (LSB), since it holds the smallest value.

WhatsEndian-1.jpeg

Well we can encode the binary in one of two ways: either left to right, or right to left.

If you notice the value will be the exact same no matter which way it is encoded. If we stream bits from least significant to most significant, it will still not matter which way it is encoded. It is only for internal structures (to a chip) that this stuff matters -- and as long as the chip is consistent, the endianness is irrelevant (mostly) to the outside world. So for these reasons, almost no one who is discussing "endian", is talking about binary endian (bit ordering). So binary endian issues are very rare.

Almost everyone expresses binary as we express decimal values, most significant digit (bit) on the left.

Byte Ordering

Unfortunately, which way you express a byte value (endianness), in a larger grouping (word - 16 bits, long word - 32 bits, or so on), can dramatically effect the results.

Byte Ordering is what most people are talking about when they are discussing "endian" issues.

Imagine two (32 bit) processors that order bytes differently. Remember, a byte is a group of 8 bits, and can represent an ASCII character (standard encoding of some letter of the alphabet or special symbol). Now imagine what happens when those two processors try to talk to the same data --

The Big Endian Processor will store the data in one order (direction). The problem is that the other processor reading that data would read the data group in a different order. So one processor would write out U-N-I-X, and the other processor would read it in as X-I-N-U; assuming they were both 32 bit processors (which group in 4 character chunks). Big problem.

Older processors were only 16 bit processors -- when this problem first started showing its ugly head. In those cases, the fist processor would write 'UN' and 'IX' as two 16 bit groups, and the other processor would read in the first pair as 'NU' and the next pair and 'XI' -- or 'NUXI'. Which is why some UNIX programmers call endian issues, "The NUXI Problem".

The problem doesn't only apply to character streams, but to (binary) values as well. Remember, many values are combinations of bytes. A byte holds a value from 0-255 (or 256 total values). If you have two bytes (16 bits total) making up a value, then the value has a range of 0-65535 (256 x 256 total values). But one byte is the "least significant" and represents the 1 times its value, the other is the "most significant" and represents 256 times its value. So on one machine it will write out the MSB as 50, and the LSB as 10 -- which means 50 x 256 + 10 = 12,810 -- but the other machine can read it in with the LSB as 50, and the MSB as 10 -- which means 10 x 256 + 50 = 2,610. Still a problem.

Note: Just for fun, MSB sometimes means Most-Significant-Bit, and at other times means Most-Significant-Byte. And the same for LSB -- Least Significant Bit or Byte. You get to figure out which someone is talking about based on context.

Middle-Endian

This is a rarely used term, and implementations, having to do with any non-normal orders. Normal being big endian (4-3-2-1) or little endian (1-2-3-4). Imagine a 32 bit processor behaving like two 16 bit processors -- it might store either 3-4-1-2 or 2-1-4-3. For a 32 bit processor that is middle-endian (and ugly). Some minicomputers used this for something called packed decimal formats and so on -- but it is really rare. The term is used some by non-U.S. programmers, who think American dates are "middle endian". Remember, Americans write; mm/dd/yy -- while Europeans more logically write in little endian; dd/mm/yy (from least to most significant).

Conclusion

There is no real solution for the Byte Ordering (endian) problems. Everyone just has to agree on how data will be stored. One processor or the other will have to translate (transpose the order) for the other one.

Some newer processors (like the PowerPC) can now work "Bi-Endian" (they go either way) -- but usually the Operating Systems they are running are dependent on some endianness. So they are usually set one way or the other when they are first turned on -- and they seldom switch.For the record: Intel processors (x86 and Pentiums) are little endian, and Motorola Processors (68000's) are big endian. MacOS is big endian, and Windows is little endian.

So Little Endian is usually used to describe an architecture where bytes (in a computer "word") at lower addresses have lower significance (the word is stored `little-end-first'). Big Endian is usually used to describe an architecture where bytes (in a computer "word") at lower addresses have higher significance (the word is stored `big-end-first'). And the whole thing comes from Gulliver's Travels. So who says geeks have no sense of humor? We have a sense of humor --it is just that no one else understands it without a 4 page explanation!

Reference

Programming : BASICBinary, OCTal, HEXadecimalBig or Little EndianEnterprise Tools

2000.10.25