Почему то, что структуры данных обычно имеют размер 2^n?

Question

Почему то, что структуры данных обычно имеют размер 2^n?

Кажется, что это FAQ, и предлагаемое разрешение:

Простой поиск (Ctrl+H) без regexp

можно повернуть выставленный для обозрения Выставочный Конец / Строки или просмотреть/Показать Все и выбрать теперь видимые символы новой строки. Тогда при запуске команды, некоторые символы, соответствующие символу новой строки, будут вставляться в поле поиска. Соответствия будут заменены строкой замены, в отличие от этого, в regex режиме.

Примечание 1: при выборе их с мышью запустите незадолго до них и перетащите к запуску следующей строки. Перетаскивание до конца строки не будет работать.

Примечание 2: Вы не можете скопировать и вставить их в поле сами.

Расширенный поиск (Ctrl+R) без regexp

Ctrl+M вставит что-то, что соответствует новым строкам. Они будут заменены строкой замены.

5

c data-structures size buffer

задан Brian Tompsett - 汤莱恩 8 November 2015 в 22:51

10 ответов

Строки кэша обычно кратны 2 (часто 32 или 64). Данные, кратные этому числу, могут уместиться (и полностью использовать) соответствующее количество строк кэша. Чем больше данных вы можете упаковать в свой кеш, тем лучше будет производительность ... поэтому я думаю, что люди, которые проектируют свои структуры таким образом, оптимизируются для этого.

8

ответ дан 18 December 2019 в 06:35

Another reason in addition to what everyone else has mentioned is, SSE instructions take multiple elements, and the number of elements input is always some power of two. Making the buffer a power of two guarantees you won't be reading unallocated memory. This only applies if you're actually using SSE instructions though.

I think in the end though, the overwhelming reason in most cases is that programmers like powers of two.

2

ответ дан 18 December 2019 в 06:35

Hash Tables, Allocation by Pages

This really helps for hash tables, because you compute the index modulo the size, and if that size is a power of two, the modulus can be computed with a simple bitwise-and or & rather than using a much slower divide-class instruction implementing the % operator.

Looking at an old Intel i386 book, and is 2 cycles and div is 40 cycles. A disparity persists today due to the much greater fundamental complexity of division, even though the 1000x faster overall cycle times tend to hide the impact of even the slowest machine ops.

There was also a time when malloc overhead was occasionally avoided at great length. Allocation's available directly from the operating system would be (still are) a specific number of pages, and so a power of two would be likely to make the most use of the allocation granularity.

And, as others have noted, programmers like powers of two.

2

ответ дан 18 December 2019 в 06:35

I can think of a few reasons off the top of my head:

2^n is a very common value in all of computer sizes. This is directly related to the way bits are represented in computers (2 possible values), which means variables tend to have ranges of values whose boundaries are 2^n.
Because of the point above, you'll often find the value 256 as the size of the buffer. This is because it is the largest number that can be stored in a byte. So, if you want to store a string together with a size of the string, then you'll be most efficient if you store it as: SIZE_BYTE+ARRAY, where the size byte tells you the size of the array. This means the array can be any size from 1 to 256.
Many other times, sizes are chosen based on physical things (for example, the size of the memory an operating system can choose from is related to the size of the registers of the CPU etc) and these are also going to be a specific amount of bits. Meaning, the amount of memory you can use will usually be some value of 2^n (for a 32bit system, 2^32).
There might be performance benefits/alignment issues for such values. Most processors can access a certain amount of bytes at a time, so even if you have a variable whose size is let's say) 20 bits, a 32 bit processor will still read 32 bits, no matter what. So it's often times more efficient to just make the variable 32 bits. Also, some processors require variables to be aligned to a certain amount of bytes (because they can't read memory from, for example, addresses in the memory that are odd). Of course, sometimes it's not about odd memory locations, but locations that are multiples of 4, or 6 of 8, etc. So in these cases, it's more efficient to just make buffers that will always be aligned.

Ok, those points came out a bit jumbled. Let me know if you need further explanation, especially point 4 which IMO is the most important.

1

ответ дан 18 December 2019 в 06:35

Because of the simplicity (read also cost) of base 2 arithmetic in electronics: shift left (multiply by 2), shift right (divide by 2).

In the CPU domain, lots of constructs revolve around base 2 arithmetic. Busses (control & data) to access memory structure are often aligned on power 2. The cost of logic implementation in electronics (e.g. CPU) makes for arithmetics in base 2 compelling.

Of course, if we had analog computers, the story would be different.

FYI: the attributes of a system sitting at layer X is a direct consequence of the server layer attributes of the system sitting below i.e. layer < x. The reason I am stating this stems from some comments I received with regards to my posting.

E.g. the properties that can be manipulated at the "compiler" level are inherited & derived from the properties of the system below it i.e. the electronics in the CPU.

1

ответ дан 18 December 2019 в 06:35

I was going to use the shift argument, but could think of a good reason to justify it.

One thing that is nice about a buffer that is a power of two is that circular buffer handling can use simple ands rather than divides:

#define BUFSIZE 1024

++index;                // increment the index.
index &= BUFSIZE;       // Make sure it stays in the buffer.

If it weren't a power of two, a divide would be necessary. In the olden days (and currently on small chips) that mattered.

0

ответ дан 18 December 2019 в 06:35

It's also common for pagesizes to be powers of 2.

On linux I like to use getpagesize() when doing something like chunking a buffer and writing it to a socket or file descriptor.

0

ответ дан 18 December 2019 в 06:35

It's makes a nice, round number in base 2. Just as 10, 100 or 1000000 are nice, round numbers in base 10.

If it wasn't a power of 2 (or something close such as 96=64+32 or 192=128+64), then you could wonder why there's the added precision. Not base 2 rounded size can come from external constraints or programmer ignorance. You'll want to know which one it is.

Other answers have pointed out a bunch of technical reasons as well that are valid in special cases. I won't repeat any of them here.

0

ответ дан 18 December 2019 в 06:35

В хеш-таблицах 2 ^ n упрощает обработку столкновений ключей определенным образом. Как правило, при конфликте ключей вы либо создаете подструктуру, например, список, всех записей с одинаковым значением хеш-функции; или вы найдете другой свободный слот. Вы можете просто добавить 1 к индексу слота, пока не найдете свободный слот; но эта стратегия не оптимальна, так как создает кластеры заблокированных мест. Лучшая стратегия - вычислить второе хеш-число h2, так что gcd (n, h2) = 1; затем добавьте h2 к индексу слота, пока не найдете свободный слот (с переносом). Если n является степенью 2, найти h2, удовлетворяющее НОД (n, h2) = 1, легко, подойдет любое нечетное число.

0

ответ дан 18 December 2019 в 06:35

Другие вопросы по тегам:

c data-structures size buffer

Почему то, что структуры данных обычно имеют размер 2^n?

10 ответов

Hash Tables, Allocation by Pages

Похожие вопросы: