Binary search for students

I'm putting together an example for students where I would like to show how we can optimize even such basic algorithm like a binary search with respect to hardware. I do not consider threads. I wrote it in C++ and the whole example can be found here. I'm trying to show some examples of the prefetching.

Test data If we would like to see the effects of the optimizations proposed in the following examples, then it is necessary to use data that are not in the CPU cache.

We are going to test it using some code like this

// arr - array with at least 100MB of values

// test_arr - array with values that we are going to search in arr

// test_count - size of the test_arr

for (auto i = 0u; i < test_count; ++i) {

   auto position = binarySearch(item_count, arr, test_arr[i]);

}

Size - If the arr array is significantly larger then largest CPU cache then most of the array won't be stored there and we will inevitably experience the L2 cache misses during the run. L2 cache misses are slow and it will become the major bottleneck of our binary search algorithm.

Randomness - It is better to have data in the test_arr that accesses the arr randomly, otherwise, the basic binary search algorithm will benefit from data access locality.

Please note that the L2 cache misses can be experienced even on a small array in some complex algorithm which works with a lot of data. In other words, we need a large array only for the purpose of testing, but the optimization can be useful everywhere where expect the binary search to happen on a data outside of CPU cache.

The basic algorithm It is something along these lines:

using Type = int64_t;

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {

        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

Prefetch The first optimization simply try to prefetch possible memory in advance

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {            

        int m1 = arr[l + ((r - l) >> 2)]; // new code 

        int m2 = arr[r - ((r - l) >> 2)]; // new code 



        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

More items per iteration The second optimization is a little bit more sophisticated. It accesses more than one item per iteration which allows read more relevant data per CPU L2 cache miss wait.

int binarySearch_duo(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (r - l > 3) {



        int delta = (r - l) / 3;

        int m1 = l + delta;

        int m2 = r - delta;



        if (arr[m1] == search) return m1;

        if (arr[m2] == search) return m2;



        if (arr[m1] < search) {

            if (arr[m2] < search) {

                l = m2 + 1;

            }

            else {

                r = m2 - 1;

                l = m1 + 1;

            }

        }

        else {

            r = m1 - 1;

        }

    }



    for (int i = l; i <= r; i++)

    {

        if (arr[i] == search) return i;

    }



    return -1;

}

Of course, this approach can be done with more accesses per iteration (I'm testing three as well in my demo). There are some infinite loop issues if the (l,r) interval is very small (less then three), therefore, I finish the remaining items by a simple sequential search to keep the main iteration clean and simple.

I was doing my tests with Visual Studio 2017 C++ compiler.

I have several questions:

Are there any other possible significant optimizations that could help the algorithm?

Is my explanation of the improvement of the last algorithm correct? (is it really due to the fact that the CPU read in parallel two data items; therefore, he waits for it just once not twice like in the first algorithm).

Are there any other comments about my code?

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

$begingroup$
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. Please see what you may and may not do after receiving answers. There is no immediate problem, but if you add more code we'll have to rollback the edit. Should your code be improved over time, post a follow-up question linking back to this one instead. Give it some time though.
$endgroup$
– Mast
14 mins ago

add a comment |

Test data If we would like to see the effects of the optimizations proposed in the following examples, then it is necessary to use data that are not in the CPU cache.

We are going to test it using some code like this

// arr - array with at least 100MB of values

// test_arr - array with values that we are going to search in arr

// test_count - size of the test_arr

for (auto i = 0u; i < test_count; ++i) {

   auto position = binarySearch(item_count, arr, test_arr[i]);

}

Randomness - It is better to have data in the test_arr that accesses the arr randomly, otherwise, the basic binary search algorithm will benefit from data access locality.

The basic algorithm It is something along these lines:

using Type = int64_t;

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {

        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

Prefetch The first optimization simply try to prefetch possible memory in advance

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {            

        int m1 = arr[l + ((r - l) >> 2)]; // new code 

        int m2 = arr[r - ((r - l) >> 2)]; // new code 



        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

More items per iteration The second optimization is a little bit more sophisticated. It accesses more than one item per iteration which allows read more relevant data per CPU L2 cache miss wait.

int binarySearch_duo(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (r - l > 3) {



        int delta = (r - l) / 3;

        int m1 = l + delta;

        int m2 = r - delta;



        if (arr[m1] == search) return m1;

        if (arr[m2] == search) return m2;



        if (arr[m1] < search) {

            if (arr[m2] < search) {

                l = m2 + 1;

            }

            else {

                r = m2 - 1;

                l = m1 + 1;

            }

        }

        else {

            r = m1 - 1;

        }

    }



    for (int i = l; i <= r; i++)

    {

        if (arr[i] == search) return i;

    }



    return -1;

}

I was doing my tests with Visual Studio 2017 C++ compiler.

I have several questions:

Are there any other possible significant optimizations that could help the algorithm?

Is my explanation of the improvement of the last algorithm correct? (is it really due to the fact that the CPU read in parallel two data items; therefore, he waits for it just once not twice like in the first algorithm).

Are there any other comments about my code?

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

$begingroup$
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. Please see what you may and may not do after receiving answers. There is no immediate problem, but if you add more code we'll have to rollback the edit. Should your code be improved over time, post a follow-up question linking back to this one instead. Give it some time though.
$endgroup$
– Mast
14 mins ago

add a comment |

Test data If we would like to see the effects of the optimizations proposed in the following examples, then it is necessary to use data that are not in the CPU cache.

We are going to test it using some code like this

// arr - array with at least 100MB of values

// test_arr - array with values that we are going to search in arr

// test_count - size of the test_arr

for (auto i = 0u; i < test_count; ++i) {

   auto position = binarySearch(item_count, arr, test_arr[i]);

}

Randomness - It is better to have data in the test_arr that accesses the arr randomly, otherwise, the basic binary search algorithm will benefit from data access locality.

The basic algorithm It is something along these lines:

using Type = int64_t;

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {

        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

Prefetch The first optimization simply try to prefetch possible memory in advance

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {            

        int m1 = arr[l + ((r - l) >> 2)]; // new code 

        int m2 = arr[r - ((r - l) >> 2)]; // new code 



        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

More items per iteration The second optimization is a little bit more sophisticated. It accesses more than one item per iteration which allows read more relevant data per CPU L2 cache miss wait.

int binarySearch_duo(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (r - l > 3) {



        int delta = (r - l) / 3;

        int m1 = l + delta;

        int m2 = r - delta;



        if (arr[m1] == search) return m1;

        if (arr[m2] == search) return m2;



        if (arr[m1] < search) {

            if (arr[m2] < search) {

                l = m2 + 1;

            }

            else {

                r = m2 - 1;

                l = m1 + 1;

            }

        }

        else {

            r = m1 - 1;

        }

    }



    for (int i = l; i <= r; i++)

    {

        if (arr[i] == search) return i;

    }



    return -1;

}

I was doing my tests with Visual Studio 2017 C++ compiler.

I have several questions:

Are there any other possible significant optimizations that could help the algorithm?

Is my explanation of the improvement of the last algorithm correct? (is it really due to the fact that the CPU read in parallel two data items; therefore, he waits for it just once not twice like in the first algorithm).

Are there any other comments about my code?

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

Test data If we would like to see the effects of the optimizations proposed in the following examples, then it is necessary to use data that are not in the CPU cache.

We are going to test it using some code like this

// arr - array with at least 100MB of values

// test_arr - array with values that we are going to search in arr

// test_count - size of the test_arr

for (auto i = 0u; i < test_count; ++i) {

   auto position = binarySearch(item_count, arr, test_arr[i]);

}

Randomness - It is better to have data in the test_arr that accesses the arr randomly, otherwise, the basic binary search algorithm will benefit from data access locality.

The basic algorithm It is something along these lines:

using Type = int64_t;

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {

        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

Prefetch The first optimization simply try to prefetch possible memory in advance

int binarySearch_basic(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (l <= r) {            

        int m1 = arr[l + ((r - l) >> 2)]; // new code 

        int m2 = arr[r - ((r - l) >> 2)]; // new code 



        int m = l + ((r - l) >> 1);

        if (arr[m] == search) return m;

        if (arr[m] < search) l = m + 1;

        else r = m - 1;

    }

    return -1;

}

More items per iteration The second optimization is a little bit more sophisticated. It accesses more than one item per iteration which allows read more relevant data per CPU L2 cache miss wait.

int binarySearch_duo(size_t item_count, const Type arr, int search)

{

    int l = 0;

    int r = item_count - 1;

    while (r - l > 3) {



        int delta = (r - l) / 3;

        int m1 = l + delta;

        int m2 = r - delta;



        if (arr[m1] == search) return m1;

        if (arr[m2] == search) return m2;



        if (arr[m1] < search) {

            if (arr[m2] < search) {

                l = m2 + 1;

            }

            else {

                r = m2 - 1;

                l = m1 + 1;

            }

        }

        else {

            r = m1 - 1;

        }

    }



    for (int i = l; i <= r; i++)

    {

        if (arr[i] == search) return i;

    }



    return -1;

}

I was doing my tests with Visual Studio 2017 C++ compiler.

I have several questions:

Are there any other possible significant optimizations that could help the algorithm?

Is my explanation of the improvement of the last algorithm correct? (is it really due to the fact that the CPU read in parallel two data items; therefore, he waits for it just once not twice like in the first algorithm).

Are there any other comments about my code?

c++ performance binary-search

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

edited 13 mins ago

asked 4 hours ago

Radim Bača

1214

New contributor

asked 4 hours ago

Radim Bača

1214

asked 4 hours ago

Radim Bača

1214

New contributor

Radim Bača is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. Please see what you may and may not do after receiving answers. There is no immediate problem, but if you add more code we'll have to rollback the edit. Should your code be improved over time, post a follow-up question linking back to this one instead. Give it some time though.
$endgroup$
– Mast
14 mins ago

add a comment |

$begingroup$
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. Please see what you may and may not do after receiving answers. There is no immediate problem, but if you add more code we'll have to rollback the edit. Should your code be improved over time, post a follow-up question linking back to this one instead. Give it some time though.
$endgroup$
– Mast
14 mins ago

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. Please see what you may and may not do after receiving answers. There is no immediate problem, but if you add more code we'll have to rollback the edit. Should your code be improved over time, post a follow-up question linking back to this one instead. Give it some time though.

– Mast
14 mins ago

add a comment |

1 Answer
1

active

oldest

votes

How did you measure the performance of your code (test code, hardware, compiler, etc.)? Without that it is impossible to get into optimization. Depending on chosen compiler I was sort of able to reproduce your results, see this quick bench- for clang 7.0.0 the fancy algorithm is indeed faster. That being said with gcc the basic one is a lot faster (outperforming fancy algorithm in both clang and gcc).

With that in mind, you can optimize your algorithm for different array sizes quite easily (benchmarking on your specific setup), by choosing fastest of the 3 for specific sizes.

As far as the code is concerned, there are a couple of things you can improve.

You could rewrite your algorithms so that they accept iterators, just like all the algorithms from standard library, that will drastically improve their flexibility without added effort (and most probably without degrading performance, but you should measure it anyway).
Apart from that, you should avoid implicit casts, e.g. here:

int r = item_count - 1;

You should also prefer to keep the bodies of your functions short, it's hard to understand what binarySearch_duo actually does (you could easily extract body of the loop etc). Also, avoid magic constants, where does the seven in: while (r - l > 7) { come from?

Edit: I've creted one more benchmark for this, this time as proposed by OP, that is with huge array size (actually a vector) and made the access to the elements of the array random, see here- this time with gcc fancy version is as good as basic. The lesson should probably be: compiler is usually at least as smart as you are :).

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

$begingroup$
You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.
$endgroup$
– Radim Bača
1 hour ago

1

$begingroup$
Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.
$endgroup$
– paler123
1 hour ago

$begingroup$
You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.
$endgroup$
– Radim Bača
1 hour ago

$begingroup$
I have updated the question. I hope it is more clear now. Thank you for your recommendations.
$endgroup$
– Radim Bača
49 mins ago

$begingroup$
Concerning your benchmark - it is not large enough and also the queries into the array are not random.
$endgroup$
– Radim Bača
42 mins ago

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Radim Bača is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212088%2fbinary-search-for-students%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

As far as the code is concerned, there are a couple of things you can improve.

int r = item_count - 1;

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

$begingroup$
You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.
$endgroup$
– Radim Bača
1 hour ago

1

$begingroup$
Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.
$endgroup$
– paler123
1 hour ago

$begingroup$
You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.
$endgroup$
– Radim Bača
1 hour ago

$begingroup$
I have updated the question. I hope it is more clear now. Thank you for your recommendations.
$endgroup$
– Radim Bača
49 mins ago

$begingroup$
Concerning your benchmark - it is not large enough and also the queries into the array are not random.
$endgroup$
– Radim Bača
42 mins ago

|
show 2 more comments

As far as the code is concerned, there are a couple of things you can improve.

int r = item_count - 1;

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

$begingroup$
You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.
$endgroup$
– Radim Bača
1 hour ago

1

$begingroup$
Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.
$endgroup$
– paler123
1 hour ago

$begingroup$
You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.
$endgroup$
– Radim Bača
1 hour ago

$begingroup$
I have updated the question. I hope it is more clear now. Thank you for your recommendations.
$endgroup$
– Radim Bača
49 mins ago

$begingroup$
Concerning your benchmark - it is not large enough and also the queries into the array are not random.
$endgroup$
– Radim Bača
42 mins ago

|
show 2 more comments

As far as the code is concerned, there are a couple of things you can improve.

int r = item_count - 1;

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

As far as the code is concerned, there are a couple of things you can improve.

int r = item_count - 1;

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

edited 23 mins ago

answered 2 hours ago

paler123

1614

New contributor

answered 2 hours ago

paler123

1614

answered 2 hours ago

paler123

1614

New contributor

paler123 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.
$endgroup$
– Radim Bača
1 hour ago

1

$begingroup$
Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.
$endgroup$
– paler123
1 hour ago

$begingroup$
You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.
$endgroup$
– Radim Bača
1 hour ago

$begingroup$
I have updated the question. I hope it is more clear now. Thank you for your recommendations.
$endgroup$
– Radim Bača
49 mins ago

$begingroup$
Concerning your benchmark - it is not large enough and also the queries into the array are not random.
$endgroup$
– Radim Bača
42 mins ago

|
show 2 more comments

$begingroup$
You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.
$endgroup$
– Radim Bača
1 hour ago

1

$begingroup$
Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.
$endgroup$
– paler123
1 hour ago

$begingroup$
You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.
$endgroup$
– Radim Bača
1 hour ago

$begingroup$
I have updated the question. I hope it is more clear now. Thank you for your recommendations.
$endgroup$
– Radim Bača
49 mins ago

$begingroup$
Concerning your benchmark - it is not large enough and also the queries into the array are not random.
$endgroup$
– Radim Bača
42 mins ago

You are using a small array that fits into the L2 cache (the CPU type is unknown). You can try to extend the size of the array to 1M to see the fancy code to win. However, it is still quite a small array and L2 cache misses do not occur significantly often. The number 7 is not important for the performance, it can be any small number lager than 3. I selected 7 since it approximately corresponds to one 64 cache line.

– Radim Bača
1 hour ago

Not necessarily, gcc even after increasing the count of elements in the array to 10^6 elements produces better times for basic implementation. What I'm trying to say is that your results will vary depending on your benchmark and if you won't provide reliable way to measure performance, then it's tough to recommend further optimisations.

– paler123
1 hour ago

You are right. It is true that I did not specify the benchmark clearly here. I'll update the question.

– Radim Bača
1 hour ago

I have updated the question. I hope it is more clear now. Thank you for your recommendations.

– Radim Bača
49 mins ago

Concerning your benchmark - it is not large enough and also the queries into the array are not random.

– Radim Bača
42 mins ago

|
show 2 more comments

Radim Bača is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Radim Bača is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

x,ShaW 8oO wq,ss

搜尋此網誌

Xrhrft