Solving the "Longest Substring Without Repeating Characters" Problem in C#
Here is my walkthrough on this challenge:
In this blog post (and YouTube video), we'll embark on a journey to solve a classic LeetCode problem: finding the length of the longest substring without repeating characters. We'll explore both a brute force method and an optimized solution in C#.
The Problem Statement
The challenge we face is to determine the length of the longest substring in a given string s
in which no character is repeated.
What is a substring?
LeetCode defines a substring as, “a contiguous non-empty sequence of characters within a string.”. In simple terms, a substring is a smaller part of a longer text or sequence, created by taking consecutive characters or elements from the original without skipping any.
With that out of the way, let’s dive into a few examples to understand the problem:
Example 1:
- Input: "abcabcbb"
- Output: 3
- Explanation: The answer is "abc," with a length of 3.
Example 2:
- Input: "bbbbb"
- Output: 1
- Explanation: The answer is "b," with a length of 1.
Example 3:
- Input: "pwwkew"
- Output: 3
- Explanation: The answer is "wke," with a length of 3.
It's vital to remember that the answer must be a substring, not a subsequence. For instance, "pwke" is a subsequence but not a substring.
HashSets vs. Dictionaries
Now, let's address the critical question: Why do we use a hash set in this problem instead of a dictionary?
HashSets:
- Simplicity: Hash sets are designed to store unique elements. They provide constant-time (O(1)) complexity for lookups, making them efficient for checking whether a character is already in the hash set.
- Tracking Unique Elements: In this problem, we only need to track unique characters, not associated values or key-value pairs. A hash set is simple and efficient for this purpose.
Dictionaries:
- Key-Value Pairs: Dictionaries are useful when you need to associate values with keys. In this problem, using a dictionary to map characters to their positions in the string would introduce additional complexity without significant benefits.
- Additional Information: Dictionaries are ideal when you need to store and retrieve additional information related to the elements. However, in this context, we only need to track the presence or absence of characters.
Hash Functions:
Both hash sets and dictionaries use hash functions to store and retrieve data efficiently. A hash function takes an input (e.g., a character) and produces a fixed-size value (a hash code) that represents that input. This hash code is used as an index in an internal data structure (e.g., an array or a hash table) to store and retrieve the associated value (in the case of a dictionary) or the presence of the element (in the case of a hash set). The goal of a good hash function is to distribute the elements uniformly across the available slots, reducing collisions and ensuring efficient lookup.
Brute Force Approach
Before we dive into the optimized solution, let's first explore the brute force approach. This method involves checking all possible substrings in the given string to find the longest one without repeating characters. It's an excellent starting point to appreciate the efficiency of the optimized solution.
The steps for the brute force approach include:
- We start by defining a method called
LengthOfLongestSubstringOptimized
, which takes a strings
as input. - We initialize a variable
maxLen
to 0. This variable will keep track of the maximum length of a substring without repeating characters. It's initialized to 0 at the beginning. - Next, we create a loop that will iterate through each character in the string
s
as a potential starting character for a substring. - Inside the loop, we create a
HashSet<char>
calledcharSet
. This set will be used to keep track of unique characters within the current substring. - We initialize an integer variable
len
to 0. This variable will be used to count the length of the current substring without repeating characters. It is set to 0 at the beginning of each iteration because we are starting with an empty substring. - We create a nested loop that starts at the same position as the outer loop (
i
) and iterates through the rest of the string using thej
variable. This inner loop is used to build the current substring. - Inside the inner loop, we check if the character at the
j
position in the strings
is already in thecharSet
hash set. If it's in the hash set, that means we've encountered a repeating character in our current substring. - If a repeating character is found, we exit the inner loop using the
break
statement because we want to stop building the current substring. We have encountered a repeating character, and we need to move on to the next starting position for a new substring. - If the character is not in the
charSet
, it means it's a new, unique character within the current substring. In that case, we add it to thecharSet
to keep track of it. - We also increment the
len
variable to denote that we've added another character to the current substring. This variable will help us keep track of the length of the current non-repeating substring. - Finally, after the inner loop, we calculate the length of the current substring (
len
) and update themaxLen
variable. We useMath.Max(maxLen, len)
to ensure thatmaxLen
holds the maximum length found so far. Iflen
is greater than the currentmaxLen
, we updatemaxLen
with the new value. - After all the iterations, the
maxLen
variable will contain the length of the longest substring without repeating characters within the input strings
. - We return
maxLen
as the result of the function.
Here's the C# code for the brute force approach:
Optimized Approach
The brute force approach works but is not the most efficient solution. To improve performance, we can utilize a two-pointer technique to avoid redundant checks and make the solution more efficient.
The Two-Pointer Technique
Now, let's explore the concept of the two-pointer technique, which is an integral part of the optimized solution.
The two-pointer technique involves maintaining two pointers, typically called start
and end
, and using them to traverse an array or a string. This technique is particularly useful for solving problems that involve finding subarrays or substrings that meet specific criteria. In our case, we use two pointers to manage the current substring and efficiently adjust the starting position when a repeating character is encountered.
Here's a summary of how the two-pointer technique is used in the optimized solution:
- We start by defining a method called
LengthOfLongestSubstringOptimized
, which takes a strings
as input. - We initialize three variables:
start
: A pointer that marks the start of the current substring without repeating characters.end
: A pointer that marks the end of the current substring without repeating characters.maxLen
: A variable that keeps track of the maximum length of a substring without repeating characters. It's initialized to 0 at the beginning.- We create a
HashSet<char>
calledcharSet
, which will be used to keep track of unique characters within the current substring. - We create a
while
loop, which continues as long as theend
pointer is less than the length of the input strings
. This loop processes the entire string. - Inside the loop, we check if the character at the
end
pointer in the input strings
is not already in thecharSet
. If it's not in the set, that means we have a new character that can be added to our current substring. - We add the character at the
end
pointer to thecharSet
. This indicates that the character is now part of our current substring without repeating characters. - We increment the
end
pointer to move to the next character in the string. - We update the
maxLen
variable by taking the maximum value between the currentmaxLen
and the difference betweenend
andstart
. This difference represents the length of the current non-repeating substring. - Now, if the character is already in the
charSet
, that means we've encountered a repeating character in our substring. In this case, we need to adjust the starting position of the substring to exclude the repeating character. - In the
else
part of the code, we remove the character at thestart
pointer from thecharSet
. This effectively trims the substring from the left, excluding the repeating character. - We increment the
start
pointer to move the starting position of our substring to the right, effectively removing the repeating character from consideration. - This process continues until we've processed the entire string.
- Finally, after processing the entire string, we exit the
while
loop, and themaxLen
variable holds the length of the longest substring without repeating characters. We return this value as the result of the function.
Here's the C# implementation of the optimized approach:
Time and Space Complexity Analysis
Brute Force Approach:
- Time Complexity: O(n^2) where n is the length of the input string
s
. We check all possible substrings. - Space Complexity: O(min(n, m)) where m is the size of the character set. In the worst case, the space complexity is O(n) when all characters are unique.
Optimized Approach:
- Time Complexity: O(n) where n is the length of the input string
s
. We iterate through the string once. - Space Complexity: O(min(n, m)) where m is the size of the character set. In the worst case, the space complexity is O(n) when all characters are unique.
This optimized approach is more efficient than the brute force method, as it avoids redundant checks and efficiently manages the substrings. The two-pointer technique, using start and end, allows us to adjust the starting position of the substring when a repeating character is encountered, resulting in better performance.
Conclusion
Solving the "Longest Substring Without Repeating Characters" problem is an exciting opportunity to explore different approaches in C#. The brute force approach, while straightforward, is less efficient. The optimized approach, which utilizes a the two-pointer, is more efficient and preferred for practical purposes.
Understanding the choice between hash sets and dictionaries, as well as how they use hash functions, is crucial for solving various coding challenges and developing a deeper appreciation for data structures and algorithms. Additionally, the two-pointer technique is a valuable tool for efficiently managing substrings and subarrays.
Remember to test your code with various test cases to ensure correctness and efficiency. Happy coding!