Saturday, July 18, 2015

Longest Substring with At Most Two Distinct Characters

Problem


Given a string S, find the length of the longest substring T that contains at most two
distinct characters.
For example,
Given S = “eceba”,
T is "ece" which its length is 3.

Analysis

To me, this is similar to find length of longest substring without repeated characters, and it is even easier. not sure why it's marked one level harder than it.

With a sliding window, try to dynamically determine valid substring's head and tail. In this problem, it's easier and naturally to adjust only the header pointer since when tail meets third distinct character, that character will become part of next substring, so as well the third distinct character's left adjacent neighbors.

When finding the third distinct character, try to adjust header to the left most character of left adjacent character. e.g. aabbbcc, when meeting first c, both c and its left bs will be part of next sub string, including all adjacent cs.

I still use set as a generic approach to determine uniqueness of characters. array with char as index can also be used, that's smart but has constraints.

Solution




 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
public static int longestSubStringAtMostTwoDistinctChar(String s){
 if(s==null)return -1;
 if(s.isEmpty())return 0;
 //this set contains characters that should be in current substring
 Set<Character> charSet = new HashSet<Character>();
 int maxLen=0,h=0,t=h;
 
 
 //having two pointers, one for head adjustment, one for substring tail adjustment
 //end of substring is marked by first character that makes set size of 3
 //
 while (h<s.length()){
  //inner loop stops while t pointing to third different character
  //this third character will become part of next substring
  //t only grows from 0 to n, when it reaches, process ends, 
  //so this is O(n) time complexity
  while(t<s.length() ){
   charSet.add(s.charAt(t));
   //after adding a character, we need check its size to tell 
   //if this is a third character
   if(charSet.size()>2)break;
   t++;
  }
  //now the tail is at position after the end of substring
  maxLen = Math.max(maxLen, t-h);
  //if tail is already at the end of string, then end the process
  if(t==s.length())return maxLen;
  //now adjust head pointer
  //e.g. abaac, it is actually easier to find the head position for next string
  //from tail to head. moving cursor from c back to a,a until it hits b, new start
  //s from a
  
  //then moving it towards head until it meets a different characer
  //adjacent character will be part of next substring
  //adjust h to t-1 the first sinnce t is now position after current substring
  h=t-1;
  while(h>0 && s.charAt(h)==s.charAt(h-1)){
   h--;

  }
  //when loop ends, h is either 0 or its previous character is different
  
  //remove the character from set
  //it loops back to here because there must be one character need to be removed
  charSet.remove(s.charAt(h==0?0:h-1));
  
  
   
  
 }
 return maxLen;
}

No comments: