Problem_Reservoir Sampling - xwu36/LeetCode GitHub Wiki

Reservoir Sampling

Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Algorithm:

在取第n个数据的时候,我们生成一个0到1的随机数p,如果p小于k/n,替换池中任意一个为第n个数。大于k/n,继续保留前面的数。直到数据流结束,返回此k个数。但是为了保证计算机计算分数额准确性,一般是生成一个0到n的随机数,跟k相比,道理是一样的。

Proof:

(1)初始情况k<=n,出现在水库中的k个元素的出现概率都是一致的,都是1
(2)第一步。第一步就是指,处理第k+1个元素的情况。分两种情况:元素全部都没有被替换;其中某个元素被第k+1个元素替换掉。
我们先看情况2:第k+1个元素被选中的概率是k/(k+1)(根据公式k/i),所以这个新元素在水库中出现的概率就一定是k/(k+1)(不管它替换掉哪个元素,反正肯定它是以这个概率出现在水库中)。
下面来看水库中剩余的元素出现的概率,也就是1-P(这个元素被替换掉的概率)。水库中任意一个元素被替换掉的概率是:(k/k+1)*(1/k)=1/(k+1),意即首先要第k+1个元素被选中,然后自己在集合的k个元素中被选中。
那它出现的概率就是1-1/(k+1)=k/(k+1)。可以看出来,旧元素和新元素出现的概率是相等的。
情况1:当元素全部都没有替换掉的时候,每个元素的出现概率肯定是一样的,这很显然。但具体是多少呢?就是1-P(第k+1个元素被选中)=1-k/(k+1)=1/(k+1)。
(3)归纳法:重复上面的过程,只要证明第i步到第i+1步,所有元素出现的概率是相等的即可。

Code:

//stream代表数据流  
//reservoir代表返回长度为k的池塘  
  
//从stream中取前k个放入reservoir;  
for ( int i = 1; i < k; i++)  
    reservoir[i] = stream[i];  
for (i = k; stream != null; i++) {  
    p = random(0, i);  
    if (p < k) reservoir[p] = stream[i];  
return reservoir;  

382. Linked List Random Node

Problem:

Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.

Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?

Example:

// Init a singly linked list [1,2,3].
ListNode head = new ListNode(1);
head.next = new ListNode(2);
head.next.next = new ListNode(3);
Solution solution = new Solution(head);

// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.
solution.getRandom();

Code:

/**
 * Definition for singly-linked list.
 * public class ListNode {
 *     int val;
 *     ListNode next;
 *     ListNode(int x) { val = x; }
 * }
 */
class Solution {

    ListNode head;
    Random r = new Random();
    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    public Solution(ListNode head) {
        this.head = head;
    }
    
    /** Returns a random node's value. */
    public int getRandom() {
        ListNode root = head;
        int i = 1;
        int res = root.val;
        root = root.next;
        while(root != null){
            if(r.nextInt(i + 1) < 1){
                res = root.val;
            }
            root = root.next;
            i++;
        }
        return res;
    }
}

/**
 * Your Solution object will be instantiated and called as such:
 * Solution obj = new Solution(head);
 * int param_1 = obj.getRandom();
 */

398. Random Pick Index

Problem:

Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.

Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.

Example:

int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);

// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3);

// pick(1) should return 0. Since in the array only nums[0] is equal to 1.
solution.pick(1);

Code:

Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.

Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.

Example:

int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);

// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3);

// pick(1) should return 0. Since in the array only nums[0] is equal to 1.
solution.pick(1);