Sunday, November 14, 2010

Floating Median

Today I've started to brush up my data structure knowledge. I've started with Binary Indexed Tree (BIT). TopCoder has an excellent tutorial on this. Initial segment of the tutorial is a bit hard to understand as it starts with notations and mathematical formulations but if you read that segment 2/3 times, things should be clear.

The problem I've solved requires you to -
Given a sequence of N integers in the range [0,M), find the sum of medians of N-K+1 contiguous subsequences of length K. (1<=N<=250,000; 1<=K<=1000, M=65536).


A brute force approach of sorting for each subsequence and then finding the median will time out. I've used BIT to solve the problem. As the numbers on the sequence can be at most 65536 (2^16), we can store the frequencies of each number on a BIT and when queried to find the median, we have to find the lowest element with a frequency of (K+1)/2. We will dynamically update the tree as we go through the sequence. If after adding a number on the tree, the length of the sequence goes beyond K, we will delete the (dex-K)-th element, where dex is the current index.

Finding the lowest element with a given frequency was a bit tricky. I had to start the search in the reverse order to ensure that.

Source code -

/*
Problem Name : Floating Median
Problem Number : TopCoder SRM 310 Div 1 Medium
Link : http://www.topcoder.com/stat?c=problem_statement&pm=6551&rd=9990
Problem Type : Search
Difficulty : 5.5 / 10
Interest : 8 / 10
Complexity : O(NlogM)
Date : November 14, 2010
*/

#include<string>
#include<vector>
#include<sstream>
#include<iostream>
#include<algorithm>
#include<cstdio>
#include<cstring>
#include<cstdlib>
#include<cmath>
#include<queue>
using namespace std;

#define rep(i,n) for(i=0;i<(n);i++)
#define max(a,b) (((a)>(b))?(a):(b))
#define min(a,b) (((a)<(b))?(a):(b))
#define MOD 65536
#define MAXN 250000

int tree[MOD+2];
int x[MAXN+2];

struct FloatingMedian{
 void update(int dex, int val) {
  while(dex <= MOD) {
   tree[dex] += val;
   if(dex == 0) break;
   dex += (dex & -dex);
  }
 }
 
 int find(int fre) {
  int dex = MOD, t;
  int bitMask = (1<<15);
  while(bitMask !=0 && dex >= 1) {
   t = dex - bitMask;
   if(fre <= tree[t]) dex = t;
   else fre -= tree[t];
   bitMask >>= 1;
  }
  return dex;
 }


 long long sumOfMedians(int seed, int mul, int add, int N, int K) {
  int mid = (K+1)/2;
  long long sum = 0;
  int i;
  memset(tree,0,sizeof(tree));
  rep(i,N) {
   if(i == 0) x[i] = seed;
   else x[i] =(x[i-1] * (long long)mul + add) % MOD;
   update(x[i]+1,1);
   if(i == K-1) sum += (find(mid)-1);
   if(i >= K) {
    update(x[i-K]+1, -1);
    sum += (find(mid)-1);
   }
  }
  return sum;
 }
 
};

2 comments:

  1. ami eta SRM e probably multiset diye korsilam. actually, set use korsilam, match e milte chhilo na, pore dekhsi, multiset holei systest pass kore.

    ReplyDelete
  2. Yes, I think it can be solved using any height balanced binary tree.

    I, too had participated in that SRM and I've just found out it was the SRM I've reached my highest rating on TC. I didn't submit the medium though, rather had the hard problem solved :)

    ReplyDelete