Problem_Reservoir Sampling - xwu36/LeetCode GitHub Wiki
Reservoir Sampling
Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Algorithm:
在取第n个数据的时候,我们生成一个0到1的随机数p,如果p小于k/n,替换池中任意一个为第n个数。大于k/n,继续保留前面的数。直到数据流结束,返回此k个数。但是为了保证计算机计算分数额准确性,一般是生成一个0到n的随机数,跟k相比,道理是一样的。
Proof:
(1)初始情况k<=n,出现在水库中的k个元素的出现概率都是一致的,都是1
(2)第一步。第一步就是指,处理第k+1个元素的情况。分两种情况:元素全部都没有被替换;其中某个元素被第k+1个元素替换掉。
我们先看情况2:第k+1个元素被选中的概率是k/(k+1)(根据公式k/i),所以这个新元素在水库中出现的概率就一定是k/(k+1)(不管它替换掉哪个元素,反正肯定它是以这个概率出现在水库中)。
下面来看水库中剩余的元素出现的概率,也就是1-P(这个元素被替换掉的概率)。水库中任意一个元素被替换掉的概率是:(k/k+1)*(1/k)=1/(k+1),意即首先要第k+1个元素被选中,然后自己在集合的k个元素中被选中。
那它出现的概率就是1-1/(k+1)=k/(k+1)。可以看出来,旧元素和新元素出现的概率是相等的。
情况1:当元素全部都没有替换掉的时候,每个元素的出现概率肯定是一样的,这很显然。但具体是多少呢?就是1-P(第k+1个元素被选中)=1-k/(k+1)=1/(k+1)。
(3)归纳法:重复上面的过程,只要证明第i步到第i+1步,所有元素出现的概率是相等的即可。
Code:
//stream代表数据流
//reservoir代表返回长度为k的池塘
//从stream中取前k个放入reservoir;
for ( int i = 1; i < k; i++)
reservoir[i] = stream[i];
for (i = k; stream != null; i++) {
p = random(0, i);
if (p < k) reservoir[p] = stream[i];
return reservoir;
382. Linked List Random Node
Problem:
Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.
Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?
Example:
// Init a singly linked list [1,2,3].
ListNode head = new ListNode(1);
head.next = new ListNode(2);
head.next.next = new ListNode(3);
Solution solution = new Solution(head);
// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.
solution.getRandom();
Code:
/**
* Definition for singly-linked list.
* public class ListNode {
* int val;
* ListNode next;
* ListNode(int x) { val = x; }
* }
*/
class Solution {
ListNode head;
Random r = new Random();
/** @param head The linked list's head.
Note that the head is guaranteed to be not null, so it contains at least one node. */
public Solution(ListNode head) {
this.head = head;
}
/** Returns a random node's value. */
public int getRandom() {
ListNode root = head;
int i = 1;
int res = root.val;
root = root.next;
while(root != null){
if(r.nextInt(i + 1) < 1){
res = root.val;
}
root = root.next;
i++;
}
return res;
}
}
/**
* Your Solution object will be instantiated and called as such:
* Solution obj = new Solution(head);
* int param_1 = obj.getRandom();
*/
398. Random Pick Index
Problem:
Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.
Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.
Example:
int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);
// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3);
// pick(1) should return 0. Since in the array only nums[0] is equal to 1.
solution.pick(1);
Code:
Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.
Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.
Example:
int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);
// pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning.
solution.pick(3);
// pick(1) should return 0. Since in the array only nums[0] is equal to 1.
solution.pick(1);