可能的重复:
滚动中值算法
假设整数是从数据流中读取的。以有效的方式查找到目前为止读取的元素的中位数。
我读过的解决方案:我们可以在左边使用max堆来表示小于有效中位数的元素,在右边使用min堆来表示大于有效中位数的元素。
在处理一个传入元素后,堆中的元素数量最多相差1个元素。当两个堆包含相同数量的元素时,我们发现堆根数据的平均值为有效中位数。当堆不平衡时,我们从包含更多元素的堆根中选择有效中值。
但是我们如何构造最大堆和最小堆也就是说,我们如何知道这里的有效中值?我认为我们应该在max-heap中插入1个元素然后在min-heap中插入下一个元素,如此类推。如果我说错了请指正。
下面是我简单但有效的算法(c++),用于从整数流中计算运行中值:
#include<algorithm>
#include<fstream>
#include<vector>
#include<list>
using namespace std;
void runningMedian(std::ifstream& ifs, std::ofstream& ofs, const unsigned bufSize) {
if (bufSize < 1)
throw exception("Wrong buffer size.");
bool evenSize = bufSize % 2 == 0 ? true : false;
list<int> q;
vector<int> nums;
int n;
unsigned count = 0;
while (ifs.good()) {
ifs >> n;
q.push_back(n);
auto ub = std::upper_bound(nums.begin(), nums.end(), n);
nums.insert(ub, n);
count++;
if (nums.size() >= bufSize) {
auto it = std::find(nums.begin(), nums.end(), q.front());
nums.erase(it);
q.pop_front();
if (evenSize)
ofs << count << ": " << (static_cast<double>(nums[nums.size() / 2 - 1] +
static_cast<double>(nums[nums.size() / 2]))) / 2.0 << '\n';
else
ofs << count << ": " << static_cast<double>(nums[nums.size() / 2]);
}
}
}
The bufferSize specifies the size of the numbers sequence, on which the running median must be calculated. When reading numbers from the input stream ifs the vector of the size bufferSize is maintained in sorted order. The median is calculated by taking the middle of the sorted vector, if bufferSize is odd, or the sum of the two middle elements divided by 2, when bufferSize is even. Additinally, I maintain a list of last bufferSize elements read from input. When a new element is added, I put it in the right place in sorted vector and remove from the vector the element added bufferSize steps before (the value of the element retained in the front of the list). In the same time I remove the old element from the list: every new element is placed on the back of the list, every old element is removed from the front. After reaching the bufferSize, both the list and the vector stop to grow, and every insertion of a new element is compensated be deletion of an old element, placed in the list bufferSize steps before. Note, I do not care, whether I remove from the vector exactly the element, placed bufferSize steps before, or just an element that has the same value. For the value of median it does not matter.
All calculated median values are output in the output stream.
我可以确认@schmil-the-cat的答案是正确的。
下面是一个JS的实现。我不是算法专家,但我认为它可能对其他人有用。
class Heap {
constructor(isMin) {
this.heap = [];
this.isMin = isMin;
}
heapify() {
if (this.heap.length === 1) {
return;
}
let currentIndex = this.heap.length - 1;
while (true) {
if (currentIndex === 0) {
break;
}
const parentIndex = Math.floor((currentIndex - 1) / 2);
const parentValue = this.heap[parentIndex];
const currentValue = this.heap[currentIndex];
if (
(this.isMin && parentValue < currentValue) ||
(!this.isMin && parentValue > currentValue)
) {
break;
}
this.heap[parentIndex] = currentValue;
this.heap[currentIndex] = parentValue;
currentIndex = parentIndex;
}
}
insert(val) {
this.heap.push(val);
this.heapify();
}
pop() {
const val = this.heap.shift();
this.heapify();
return val;
}
top() {
return this.heap[0];
}
length() {
return this.heap.length;
}
}
function findMedian(arr) {
const topHeap = new Heap(true);
const bottomHeap = new Heap(false);
const output = [];
if (arr.length === 1) {
return arr[0];
}
topHeap.insert(Math.max(arr[0], arr[1]));
bottomHeap.insert(Math.min(arr[0], arr[1]));
for (let i = 0; i < arr.length; i++) {
const currentVal = arr[i];
if (i === 0) {
output.push(currentVal);
continue;
}
if (i > 1) {
if (currentVal < bottomHeap.top()) {
bottomHeap.insert(currentVal);
} else {
topHeap.insert(currentVal);
}
}
if (bottomHeap.length() - topHeap.length() > 1) {
const bottomVal = bottomHeap.pop();
topHeap.insert(bottomVal);
}
if (topHeap.length() - bottomHeap.length() > 1) {
const topVal = topHeap.pop();
bottomHeap.insert(topVal);
}
if (bottomHeap.length() === topHeap.length()) {
output.push(Math.floor((bottomHeap.top() + topHeap.top()) / 2));
continue;
}
if (bottomHeap.length() > topHeap.length()) {
output.push(bottomHeap.top());
} else {
output.push(topHeap.top());
}
}
return output;
}