How big is an integer?

In the beginning, when all was without form and void, Kernighan and Ritchie created char. And they said, “let it be of a size chosen by the compiler, guaranteed to be large enough to hold one character from the execution character set.” And so it was, and they decreed that whatever the size of this char, the compiler would call its size 1.

Right, that’s enough silly voice. There were also other types of integer: short, int, long, long long, and pointers. The point is that on any system, you could find out how big one of these numbers is (using the compiler’s sizeof() feature) but that size depended on the system you were compiling for. Assuming that a sizeof(char)==1 is OK, but assuming that sizeof(int)==4 will lead to trouble: it’s 2 on some systems and 8 on some others, for example.

C also provides the typedef feature, which lets you give new names to existing types. Plenty of API designers use typedef to rename integer types to give some clue as to their meaning; so you’ll see size_t used to describe the size of something, ptrdiff_t to express the difference between two pointers, and so on.

Leaving the size of the various types undefined gives plenty of flexibility to implementors. A compiler for a given platform can choose to create ints that are the same size as a CPU register, or the maximum size transferable on the data bus in one load operation. It gives benefits to well-written software, which can be ported to hardware with different data characteristics just by recompiling. It also causes some problems for programmers whose software needs to talk to, well, anything else.

Imagine two computers communicating over a network. One of them wants to send an integer to the other, and the program represents the integer as an int. Well the receiving computer could have a different idea of how big an int is. Maybe the sender puts four bytes onto the network, but the receiver waits forever because it wants eight bytes. Alternatively, maybe the sender delivers eight bytes, the first four of which are incorrectly used as the integer, and the next four remain in the queue to be incorrectly used for the next value required.

The same problem can occur with files, too. Imagine that my app writes an int to disk. My customer then upgrades their computer, and my same app running on a different architecture tries to read the int back in. Does it get the same value? I’ve even seen this problem with two processes on the same computer, where one was a 64-bit kernel talking to a 32-bit user process. [N.B. a related problem is that every process needs to agree on which byte goes where in multi-byte integers; a problem not considered here].

Clearly there’s a need for integer types that are of a stable size, guaranteed to remain the same whatever architecture the software is running on. The inttypes.h or stdint.h headers, introduced in C99 (so well over a decade ago), provide these (and more). If the target environment is capable of providing an integer type that uses exactly eight bits, you can access that as int8_t (uint8_t for unsigned integers). Whether or not this is available, the smallest type that holds at least eight bits is called int_least8_t. The integer type that holds at least eight bits and is fastest for the computer to handle can also be used, as int_fast8_t. Standard implementations should provide these types for 8, 16, 32 or 64 bit integers, and may provide types for other sizes too.

The point of all of this is that while there are guaranteed-size integer types available, anything that isn’t obviously of a specific size should be treated as if it’s of unknown size. Take, for example, NSInteger. It and the unsigned NSUInteger type were introduced by Apple to provide source code compatibility between 32 and 64-bit Cocoa API code, while also expanding the values used and returned by the API on wider platforms.

This could have been done by keeping the API as it was, and changing the size of int on 64-bit Cocoa from 4 bytes to 8. This would’ve been a poor choice, because plenty of code that assumes (wrongly) that sizeof(int)==4 would have broken. Most other 64-bit environments provide eight byte longs and pointers and four-byte ints, and Apple chose to follow suit for better compatibility.

Instead, NSInteger’s underlying type depends on the architecture you’re compiling for. Currently, all Apple’s 32-bit platforms define it as int, and the 64-bit platforms define it as long. The end result is that while an NSInteger is guaranteed to be big enough to hold the length of an NSArray or an NSString, it isn’t guaranteed to be the same size as someone else’s NSInteger. Some compatibility issues still remain, and failing to deal with them can lead to some subtle bugs that only manifest themselves in particular situations.

About Graham

I make it faster and easier for you to create high-quality code.
This entry was posted in Uncategorized. Bookmark the permalink.