Friday, July 17, 2015

Longest Substring Without Repeating Characters

Problem:



Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for "abcabcbb" is "abc", which the length is 3. For "bbbbb" the longest substring is "b", with the length of 1.


Analysis



Brute force solution would be starting from each position, and use an inner loop to find longest substring without repeated character. As a normal way, I use set to hold characters already visited. (even though it's smart to use array indexed with character, set is more generic approach) and it builds up a substring from start point to position before the first repeat of character. During the process, maximum length is tracked and will be returned at last.


This will work but very inefficient since it spend most of time working on rebuilding the shorter  substrings that have been built. But this approach is a good start for people to understand how it works and why other ways are better. Basic understanding will also build up a base for further smart thought.


Solutions





  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
import java.util.HashSet;
import java.util.Set;

public class longestNoneRepeat {

 public static void main(String[] args) {
  System.out.println(lengthOfLongestSubstringBruteForceModified("abcdefabcfabcdefhijka"));

 }


 /**
  * starting from a position, using inner loop to find tail still using set
  * to tell if character is repeated, but at start of each outter loop, set
  * need to be cleared Note: this is working,but inefficient. time limit
  * exceeded is what being reported by leetcode online judge.
  * 
  * @param s
  * @return
  */
 public static int lengthOfLongestSubstringBruteForce(String s) {
  Set<Character> charSet = new HashSet<Character>();
  int i = 0, maxLen = 0;
  for (int j = 0; j < s.length(); j++) {
   charSet.clear();
   charSet.add(s.charAt(j));
   i = j + 1;
   while (i < s.length() && !charSet.contains(s.charAt(i))) {
    charSet.add(s.charAt(i));
    i++;
   }

   // maxLen = Math.max(i-j, maxLen);
   maxLen = Math.max(charSet.size(), maxLen);
  }
  return maxLen;
 }

 /**
  * to avoid some of extra useless work. this will be better than brute
  * force, but extra loop of inner loop is still there. we need it do not
  * start over
  * 
  * @param s
  * @return
  */
 public static int lengthOfLongestSubstringBruteForceModified(String s) {
  Set<Character> charSet = new HashSet<Character>();
  int j = 0, i = 0, maxLen = 0;
  while (j < s.length()) {
   charSet.clear();
   charSet.add(s.charAt(j));
   i = j + 1;
   while (i < s.length() && !charSet.contains(s.charAt(i))) {
    charSet.add(s.charAt(i));
    i++;
   }

   // maxLen = Math.max(i-j, maxLen);
   maxLen = Math.max(charSet.size(), maxLen);
   // reset j, j should be a position after the character being
   // repeated
   if (i == s.length())
    break;
   while (s.charAt(j) != s.charAt(i))
    j++;
   j++;
  }
  return maxLen;
 }

 /**
  * accepted
  * 
  * @param s
  * @return
  */
 public static int lengthOfLongestSubstringBruteForceModified2(String s) {
  Set<Character> charSet = new HashSet<Character>();
  int j = 0, i = 0, maxLen = 0;
  while (j < s.length()) {

   charSet.add(s.charAt(j));
   i = j + charSet.size();
   while (i < s.length() && !charSet.contains(s.charAt(i))) {
    charSet.add(s.charAt(i));
    i++;
   }

   // maxLen = Math.max(i-j, maxLen);
   maxLen = Math.max(charSet.size(), maxLen);
   // reset j, j should be a position after the character being
   // repeated
   if (i == s.length())
    break;
   // adjust header, remove left elements to the repeated character
   // leave the repeated character in the set because it should be in
   // next substring
   while (s.charAt(j) != s.charAt(i)) {
    charSet.remove(s.charAt(j));
    j++;
   }
   // move head to next position of the repeated character
   j++;
  }
  return maxLen;
 }
 /**
  * abcdefabcfabcdefhijka 1. The way to remember occurrence of
  * character/element is to use a character array or set/hashtable as long as
  * they provide constant time to access 2. repeated or not is only
  * meaningful to current substring that is worked on 3. if at position j it
  * has a repeat, it must be repeating a character in i. how to find i? 4.
  * when find a duplication, it means the current substring is done, try to
  * modify the max length and remove the characters from set that are no
  * longer required in next substring's building up 5.how to find current
  * substring's length: j-i+1 6. how to find position of i: 6.1 with
  * character array: a[j]=a[i] because they are the repeated character which
  * is array's index. since all valid positions (building up substring
  * without repeated character) left to i can not build up a substring longer
  * than current substring, the new position for next substring is from i+1,
  * j can continue to move forward. when next repeated character is find, the
  * next substring's length is j-i+1 6.2 with set from i until character
  * being repeated, removing from set, this includes the repeated character.
  * but since repeated character at j is part on next substring, so that we
  * put it back to set later on 7. technique of innter loop: it uses cleaning
  * up status or removing from set to control how i is found 8. i is head
  * position of substring, j is tail position of substring
  * 
  * @param s
  * @return
  */
 public static int lengthOfLongestSubstringSet(String s) {
  Set<Character> charSet = new HashSet<Character>();
  int i = 0, maxLen = 0;
  for (int j = 0; j < s.length(); j++) {
   while (charSet.contains(s.charAt(j))) {
    charSet.remove(s.charAt(i));
    i++;
   }
   charSet.add(s.charAt(j));
   maxLen = Math.max(j - i + 1, maxLen);
  }
  return maxLen;
 }

}

No comments: