Jiaqin's Technical Notes: A variation on substring problem

Jan 23, 2009

A variation on substring problem

A variation on substring problem
You have a string of N characters, none of which are repeated.
For each character you are given an array of integers which contains all the locations of that character within another string (call it Big One) with length M > N
Using these N arrays of locations devise the algorithm that would calculate the smallest subset in Big One containing all of N characters.

suppose we don't know the size of character space. the N characters are { N1, N2, ..., NN }
N1's index is index1[size1]
N2's index is index2[size2]
...
NN's index is indexN[sizeN]
step1:
sort each index array order by position
step2:
build a heap size of N. heap item type
struct HeapNode{
int position; //Key, indicate position of the character
char character; //character
};
step3:
you know first elements in each index array is smallest elem in that array. insert these elements( elem position + character info ) into heap H.
minE = smallest element of H;
maxE = bigest element of H; //This we can't derive explicit from Heap property. so we must use a variable record it.
we know H has all the N characters now.
step4:
extract min element(smallest position) from heap H.
range = maxE.position - minE.position ;
if ( range < minRange ) minrange = range;
And get the next position e of array minE.character. if no such element then algorithm ends,
else if ( e.position > maxE.position )
{
maxE = e;
}
insert e to heap H.
Do step4 until algorithm end;
complexity of this algorithm is.
sort O(Mlog(M))
+
extract all the elements: O(Mlog(N))
so it's a O(Mlog(M)) algorithm.

Jiaqin's Technical Notes

Jan 23, 2009

A variation on substring problem

No comments:

Post a Comment

Brief Description

Labels