Today, I'm going to talk about a popular problem which is how to select "k" random elements of a list of size "n" and how to do this when the size of the list is unknown like if the list is some kind of an endless stream.
First attempt:
Randomly shuffle the list and return the first k elements. This is an easy solution, however, you would have to either modify the array or use O(N) extra storage where you'll copy the array elements before shuffling them.
How to shuffle the array?
You traverse the array from i = 0 to i = n - 1 and generate a random number from a random number generator between i and n - 1 and swap that element with i.
Code should be something like this (Error checking ommitted for simplicity)
Second attempt:
In this attempt, we'll pass through the array once and decide if we should insert the element into the output array or not.
This is an O(N) solution with no extra storage
let's see how we can implement this.
if we have an array of 100 elements and we want to choose 100 random elements, then each one of these elements should have a 100% percent chance of being in.
if after choosing some random elements we end up having 10 elements to choose from and 5 random elements to choose, then the next element should have a 50% chance of being in.
what if we have 4 elements to choose from and "0" elements choose from these four? then the next element should have a 0% chance of being inserted in the output.
this suggests the following algorithm
Third attempt:
This is an easy solution using O(K) extra storage and O(K) time complexity when you have a good pseudo-random generator, this is very efficient when N is very big and K is relatively small.
The solution uses a hashtable to check if the current item was inserted after selecting one random element from the input array.
Now what if you don't already know the size of the list? maybe it's a very big linked list that you don't want to traverse twice to know the size of or it's an endless stream, how do you choose k random elements from this list?
First think how would you do it when the list has only k elements? how do you choose 10 random elements from a 10-size list? you just insert all items in the output.
What if you get another element? the 11th element? you should choose if you should replace it with one of the items you already chose or just ignore it. This can be done by generating a number between 1 and 11 and replace the item with one already there if the generated random number is between 1 and 10.
This way items at the beginning of the array will have a big chance of being in and also a big chance of being thrown away later. Items at the end of the last will have a smaller chance of getting in but also a smaller chance of getting out once they get in.
This is known as the reservoir method.
See you in another post!
First attempt:
Randomly shuffle the list and return the first k elements. This is an easy solution, however, you would have to either modify the array or use O(N) extra storage where you'll copy the array elements before shuffling them.
How to shuffle the array?
You traverse the array from i = 0 to i = n - 1 and generate a random number from a random number generator between i and n - 1 and swap that element with i.
Code should be something like this (Error checking ommitted for simplicity)
void krandom(int *arr, int *output, int n, int k) { int *cpyArr = new int[n]; for(int i = 0; i < n; ++i) cpyArr[i] = arr[i]; for(int i = 0; i < n; ++i) { int j = rand(i, n - 1); swap(cpyArr[i], cpyArr[j]); } for(int i = 0; i < k; ++i) output[i] = cpyArr[i]; delete cpyArr; }
Second attempt:
In this attempt, we'll pass through the array once and decide if we should insert the element into the output array or not.
This is an O(N) solution with no extra storage
let's see how we can implement this.
if we have an array of 100 elements and we want to choose 100 random elements, then each one of these elements should have a 100% percent chance of being in.
if after choosing some random elements we end up having 10 elements to choose from and 5 random elements to choose, then the next element should have a 50% chance of being in.
what if we have 4 elements to choose from and "0" elements choose from these four? then the next element should have a 0% chance of being inserted in the output.
this suggests the following algorithm
void krandom(int *arr, int *output, int n, int k) { // Where to insert the element into output array int iOutput = 0; int kRest = k; int nChoose = n; for(int i = 0; i < n && kRest != 0; ++i) { // Generate a random number between 1 and // nChoose inclusive int iRand = rand(1, nChoose); // give the element a kRest/nChoose chance // of being inserted if(iRand <= k) { output[iOutput++] = arr[i]; // we have one less random item to insert --kRest; // we have one less element to choose from --nChoose; } } }
Third attempt:
This is an easy solution using O(K) extra storage and O(K) time complexity when you have a good pseudo-random generator, this is very efficient when N is very big and K is relatively small.
The solution uses a hashtable to check if the current item was inserted after selecting one random element from the input array.
void krandom(int *arr, int *output, int n, int k) { hash_map<int> tableItems; int iOutput = 0; while(tableItems.size() != k) { int iRand = rand(0, n-1); if(!tableItems.containsKey(iRand)) { tableItems.insert(iRand); output[iOutput++] = arr[iRand]; } } }
Now what if you don't already know the size of the list? maybe it's a very big linked list that you don't want to traverse twice to know the size of or it's an endless stream, how do you choose k random elements from this list?
First think how would you do it when the list has only k elements? how do you choose 10 random elements from a 10-size list? you just insert all items in the output.
What if you get another element? the 11th element? you should choose if you should replace it with one of the items you already chose or just ignore it. This can be done by generating a number between 1 and 11 and replace the item with one already there if the generated random number is between 1 and 10.
This way items at the beginning of the array will have a big chance of being in and also a big chance of being thrown away later. Items at the end of the last will have a smaller chance of getting in but also a smaller chance of getting out once they get in.
This is known as the reservoir method.
See you in another post!
No comments:
Post a Comment