Solving the "Longest Substring Without Repeating Characters" Problem in C#

Posted on 10/13/2023 by Jacob R Phillips

#csharp #hashset #two-pointer

We'll delve into the fascinating world of data structures, including hash sets and dictionaries, and discuss the concept of the two-pointer technique.

Here is my walkthrough on this challenge:

In this blog post (and YouTube video), we'll embark on a journey to solve a classic LeetCode problem: finding the length of the longest substring without repeating characters. We'll explore both a brute force method and an optimized solution in C#.

The Problem Statement

The challenge we face is to determine the length of the longest substring in a given string s in which no character is repeated.

What is a substring?

LeetCode defines a substring as, “a contiguous non-empty sequence of characters within a string.”. In simple terms, a substring is a smaller part of a longer text or sequence, created by taking consecutive characters or elements from the original without skipping any.

With that out of the way, let’s dive into a few examples to understand the problem:

Example 1:

Input: "abcabcbb"
Output: 3
Explanation: The answer is "abc," with a length of 3.

Example 2:

Input: "bbbbb"
Output: 1
Explanation: The answer is "b," with a length of 1.

Example 3:

Input: "pwwkew"
Output: 3
Explanation: The answer is "wke," with a length of 3.

It's vital to remember that the answer must be a substring, not a subsequence. For instance, "pwke" is a subsequence but not a substring.

HashSets vs. Dictionaries

Now, let's address the critical question: Why do we use a hash set in this problem instead of a dictionary?

HashSets:

Simplicity: Hash sets are designed to store unique elements. They provide constant-time (O(1)) complexity for lookups, making them efficient for checking whether a character is already in the hash set.
Tracking Unique Elements: In this problem, we only need to track unique characters, not associated values or key-value pairs. A hash set is simple and efficient for this purpose.

Dictionaries:

Key-Value Pairs: Dictionaries are useful when you need to associate values with keys. In this problem, using a dictionary to map characters to their positions in the string would introduce additional complexity without significant benefits.
Additional Information: Dictionaries are ideal when you need to store and retrieve additional information related to the elements. However, in this context, we only need to track the presence or absence of characters.

Hash Functions:

Both hash sets and dictionaries use hash functions to store and retrieve data efficiently. A hash function takes an input (e.g., a character) and produces a fixed-size value (a hash code) that represents that input. This hash code is used as an index in an internal data structure (e.g., an array or a hash table) to store and retrieve the associated value (in the case of a dictionary) or the presence of the element (in the case of a hash set). The goal of a good hash function is to distribute the elements uniformly across the available slots, reducing collisions and ensuring efficient lookup.

Brute Force Approach

Before we dive into the optimized solution, let's first explore the brute force approach. This method involves checking all possible substrings in the given string to find the longest one without repeating characters. It's an excellent starting point to appreciate the efficiency of the optimized solution.

The steps for the brute force approach include:

We start by defining a method called LengthOfLongestSubstringOptimized, which takes a string s as input.
We initialize a variable maxLen to 0. This variable will keep track of the maximum length of a substring without repeating characters. It's initialized to 0 at the beginning.
Next, we create a loop that will iterate through each character in the string s as a potential starting character for a substring.
Inside the loop, we create a HashSet<char> called charSet. This set will be used to keep track of unique characters within the current substring.
We initialize an integer variable len to 0. This variable will be used to count the length of the current substring without repeating characters. It is set to 0 at the beginning of each iteration because we are starting with an empty substring.
We create a nested loop that starts at the same position as the outer loop (i) and iterates through the rest of the string using the j variable. This inner loop is used to build the current substring.
Inside the inner loop, we check if the character at the j position in the string s is already in the charSet hash set. If it's in the hash set, that means we've encountered a repeating character in our current substring.
If a repeating character is found, we exit the inner loop using the break statement because we want to stop building the current substring. We have encountered a repeating character, and we need to move on to the next starting position for a new substring.
If the character is not in the charSet, it means it's a new, unique character within the current substring. In that case, we add it to the charSet to keep track of it.
We also increment the len variable to denote that we've added another character to the current substring. This variable will help us keep track of the length of the current non-repeating substring.
Finally, after the inner loop, we calculate the length of the current substring (len) and update the maxLen variable. We use Math.Max(maxLen, len) to ensure that maxLen holds the maximum length found so far. If len is greater than the current maxLen, we update maxLen with the new value.
After all the iterations, the maxLen variable will contain the length of the longest substring without repeating characters within the input string s.
We return maxLen as the result of the function.

Here's the C# code for the brute force approach:

public class Solution 
{
    public int LengthOfLongestSubstring(string s)
    {
        //Brute Force Approach 

        //counter to track maximum substring length
        int maxLength = 0;

        //iteration (loop) over input string
        for(int i = 1; i < s.Length; i++)
        {
            //keep track of current chars
            HashSet<char> charSet = new HashSet<char>();

            //keep track of current substring length
            int length = 0;

            //inner loop to build substring
            for(int j = i; j < s.Length; j++)
            {
                //does HashSet contain char? STOP BUILDING
                if(charSet.Contains(s[j]))
                {
                    break;
                }

                //if not, add it
                charSet.Add(s[j]);

                //increment current substring length
                length++;
            }
        
        //compare if maximum length is greater than current substring length
        maxLength = Math.Max(maxLength, length);
        }

        //return maximum substring length
        return maxLength;
    }
}

Optimized Approach

The brute force approach works but is not the most efficient solution. To improve performance, we can utilize a two-pointer technique to avoid redundant checks and make the solution more efficient.

The Two-Pointer Technique

Now, let's explore the concept of the two-pointer technique, which is an integral part of the optimized solution.

The two-pointer technique involves maintaining two pointers, typically called start and end, and using them to traverse an array or a string. This technique is particularly useful for solving problems that involve finding subarrays or substrings that meet specific criteria. In our case, we use two pointers to manage the current substring and efficiently adjust the starting position when a repeating character is encountered.

Here's a summary of how the two-pointer technique is used in the optimized solution:

We start by defining a method called LengthOfLongestSubstringOptimized, which takes a string s as input.
We initialize three variables:

start: A pointer that marks the start of the current substring without repeating characters.
end: A pointer that marks the end of the current substring without repeating characters.
maxLen: A variable that keeps track of the maximum length of a substring without repeating characters. It's initialized to 0 at the beginning.

We create a HashSet<char> called charSet, which will be used to keep track of unique characters within the current substring.
We create a while loop, which continues as long as the end pointer is less than the length of the input string s. This loop processes the entire string.
Inside the loop, we check if the character at the end pointer in the input string s is not already in the charSet. If it's not in the set, that means we have a new character that can be added to our current substring.
We add the character at the end pointer to the charSet. This indicates that the character is now part of our current substring without repeating characters.
We increment the end pointer to move to the next character in the string.
We update the maxLen variable by taking the maximum value between the current maxLen and the difference between end and start. This difference represents the length of the current non-repeating substring.
Now, if the character is already in the charSet, that means we've encountered a repeating character in our substring. In this case, we need to adjust the starting position of the substring to exclude the repeating character.
In the else part of the code, we remove the character at the start pointer from the charSet. This effectively trims the substring from the left, excluding the repeating character.
We increment the start pointer to move the starting position of our substring to the right, effectively removing the repeating character from consideration.
This process continues until we've processed the entire string.
Finally, after processing the entire string, we exit the while loop, and the maxLen variable holds the length of the longest substring without repeating characters. We return this value as the result of the function.

Here's the C# implementation of the optimized approach:

public class Solution 
{
    public int LengthOfLongestSubstring(string s)
    {
        //Optimized Approach

        //create two pointers 
        int start = 0;
        int end = 0;

        //create maximum substring length counter
        int maxLength = 0;

        //keep track of current chars
        HashSet<char> charSet = new HashSet<char>();

        //iterate (loop) over input string
        while(end < s.Length)
        {
            //if char doesn't exist
            if(!charSet.Contains(s[end]))
            {
                //add it
                charSet.Add(s[end]);

                //increment end pointer
                end++;

                //update maximum substring length
                maxLength = Math.Max(maxLength, (end - start));
            }
            else 
            {
                //if char does exist
                //remove it from HashSet
                //at the index of start, so we do not return subsequence
                charSet.Remove(s[start]);
                
                //increment start pointer (trim input string)
                start++;
            }
        }
        //return maximum substring length
        return maxLength;
    }
}

Time and Space Complexity Analysis

Brute Force Approach:

Time Complexity: O(n^2) where n is the length of the input string s. We check all possible substrings.
Space Complexity: O(min(n, m)) where m is the size of the character set. In the worst case, the space complexity is O(n) when all characters are unique.

Optimized Approach:

Time Complexity: O(n) where n is the length of the input string s. We iterate through the string once.
Space Complexity: O(min(n, m)) where m is the size of the character set. In the worst case, the space complexity is O(n) when all characters are unique.

This optimized approach is more efficient than the brute force method, as it avoids redundant checks and efficiently manages the substrings. The two-pointer technique, using start and end, allows us to adjust the starting position of the substring when a repeating character is encountered, resulting in better performance.

Conclusion

Solving the "Longest Substring Without Repeating Characters" problem is an exciting opportunity to explore different approaches in C#. The brute force approach, while straightforward, is less efficient. The optimized approach, which utilizes a the two-pointer, is more efficient and preferred for practical purposes.

Understanding the choice between hash sets and dictionaries, as well as how they use hash functions, is crucial for solving various coding challenges and developing a deeper appreciation for data structures and algorithms. Additionally, the two-pointer technique is a valuable tool for efficiently managing substrings and subarrays.

Remember to test your code with various test cases to ensure correctness and efficiency. Happy coding!