Facebook Pixel

28. Find the Index of the First Occurrence in a String

EasyTwo PointersStringString Matching
LeetCode ↗

Problem Description

You are given two strings: haystack and needle. Your task is to find the first occurrence of the needle string within the haystack string.

The problem asks you to return the starting index position where needle first appears as a substring in haystack. If needle is not found anywhere in haystack, you should return -1.

For example:

  • If haystack = "hello" and needle = "ll", the function should return 2 because "ll" first appears at index 2 in "hello"
  • If haystack = "aaaaa" and needle = "bba", the function should return -1 because "bba" does not exist in "aaaaa"

The solution uses a straightforward sliding window approach. It iterates through the haystack string and at each position i, checks if the substring starting at i with length equal to needle matches the needle string. The iteration continues for n - m + 1 positions where n is the length of haystack and m is the length of needle, ensuring we don't go out of bounds. When a match is found, the starting index is immediately returned. If no match is found after checking all valid positions, -1 is returned.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we need to find a substring within a larger string, the most natural approach is to check every possible position where the substring could start. Think of it like looking for a word in a sentence - you would scan from left to right, checking if each position could be the beginning of the word you're searching for.

The key insight is that we only need to check positions where there's enough remaining characters to fit the entire needle. If haystack has length n and needle has length m, the last possible starting position would be at index n - m. Any position beyond that wouldn't have enough characters left to match the full needle.

For each valid starting position i, we can extract a substring of length m from haystack starting at position i using slice notation haystack[i : i + m]. We then directly compare this substring with needle. Python's string comparison handles the character-by-character matching for us internally.

This leads us to iterate through positions from 0 to n - m (inclusive), which gives us exactly n - m + 1 positions to check. At each position, we perform the substring extraction and comparison. As soon as we find a match, we can immediately return that index since we're looking for the first occurrence. If we complete the entire loop without finding a match, we know the needle doesn't exist in haystack and return -1.

The beauty of this approach lies in its simplicity - it directly translates our intuitive understanding of substring searching into code without requiring complex data structures or algorithms.

Pattern Learn more about Two Pointers patterns.

Solution Approach

The implementation uses a simple traversal approach to find the first occurrence of needle in haystack.

Algorithm Steps:

  1. Initialize Variables: First, we store the lengths of both strings for efficiency:

    • n = len(haystack) - length of the string we're searching in
    • m = len(needle) - length of the string we're searching for
  2. Set Up the Loop: We iterate through each valid starting position in haystack:

    • The loop runs from i = 0 to i = n - m (inclusive)
    • This gives us exactly n - m + 1 positions to check
    • We use range(n - m + 1) to generate these indices
  3. Check Each Position: At each index i:

    • Extract a substring from haystack using slice notation: haystack[i : i + m]
    • This gives us a substring of length m starting at position i
    • Compare this substring directly with needle using the equality operator ==
  4. Return on First Match:

    • If haystack[i : i + m] == needle evaluates to True, we immediately return i
    • This ensures we return the index of the first occurrence
  5. Handle No Match Case:

    • If the loop completes without finding a match, we return -1
    • This indicates that needle is not present in haystack

Example Walkthrough:

Let's trace through haystack = "hello" and needle = "ll":

  • n = 5, m = 2
  • Loop iterations: i ranges from 0 to 3 (since 5 - 2 + 1 = 4 iterations)
    • i = 0: Check "he" vs "ll" → No match
    • i = 1: Check "el" vs "ll" → No match
    • i = 2: Check "ll" vs "ll" → Match found! Return 2

The time complexity is O((n - m + 1) × m) where the outer loop runs n - m + 1 times and each string comparison takes O(m) time. The space complexity is O(1) as we only use a constant amount of extra space.

Example Walkthrough

Let's walk through finding needle = "abc" in haystack = "ababcab".

Initial Setup:

  • haystack = "ababcab" (length n = 7)
  • needle = "abc" (length m = 3)
  • We'll check positions from index 0 to 4 (n - m = 7 - 3 = 4)

Step-by-step Process:

Position i = 0:
haystack: a b a b c a b
          ^ ^ ^
Extract:  "aba"
Compare:  "aba" == "abc"? → No match, continue

Position i = 1:
haystack: a b a b c a b
            ^ ^ ^
Extract:  "bab"
Compare:  "bab" == "abc"? → No match, continue

Position i = 2:
haystack: a b a b c a b
              ^ ^ ^
Extract:  "abc"
Compare:  "abc" == "abc"? → Match found! Return 2

The function returns 2 as the first occurrence of "abc" starts at index 2.

Another Example - No Match Case:

Let's try needle = "xyz" in haystack = "hello":

  • n = 5, m = 3
  • Check positions 0 to 2 (5 - 3 = 2, so 3 positions total)
i = 0: "hel" != "xyz"
i = 1: "ell" != "xyz"  
i = 2: "llo" != "xyz"

All positions checked, no match found → Return -1

Solution Implementation

1class Solution:
2    def strStr(self, haystack: str, needle: str) -> int:
3        if needle == "":
4            return 0
5
6        haystack_length = len(haystack)
7        needle_length = len(needle)
8
9        if needle_length > haystack_length:
10            return -1
11
12        for start_index in range(haystack_length - needle_length + 1):
13            if haystack[start_index:start_index + needle_length] == needle:
14                return start_index
15
16        return -1
17
1class Solution {
2    public int strStr(String haystack, String needle) {
3        if (needle.isEmpty()) {
4            return 0;
5        }
6
7        int haystackLength = haystack.length();
8        int needleLength = needle.length();
9
10        if (needleLength > haystackLength) {
11            return -1;
12        }
13
14        for (int startIndex = 0; startIndex <= haystackLength - needleLength; startIndex++) {
15            if (haystack.substring(startIndex, startIndex + needleLength).equals(needle)) {
16                return startIndex;
17            }
18        }
19
20        return -1;
21    }
22}
23
1class Solution {
2public:
3    int strStr(string haystack, string needle) {
4        if (needle.empty()) {
5            return 0;
6        }
7
8        int haystackLength = haystack.length();
9        int needleLength = needle.length();
10
11        if (needleLength > haystackLength) {
12            return -1;
13        }
14
15        for (int startIndex = 0; startIndex <= haystackLength - needleLength; startIndex++) {
16            if (haystack.substr(startIndex, needleLength) == needle) {
17                return startIndex;
18            }
19        }
20
21        return -1;
22    }
23};
24
1function strStr(haystack: string, needle: string): number {
2    if (needle.length === 0) {
3        return 0;
4    }
5
6    const haystackLength = haystack.length;
7    const needleLength = needle.length;
8
9    if (needleLength > haystackLength) {
10        return -1;
11    }
12
13    for (let startIndex = 0; startIndex <= haystackLength - needleLength; startIndex++) {
14        if (haystack.slice(startIndex, startIndex + needleLength) === needle) {
15            return startIndex;
16        }
17    }
18
19    return -1;
20}
21

Visualization

AlgoMonster exclusive: step through the algorithm state, not just the final code.

Time and Space Complexity

The time complexity of this code is O((n - m) × m), where n is the length of the haystack string and m is the length of the needle string.

This is because:

  • The outer loop runs n - m + 1 times (iterating through all possible starting positions in the haystack)
  • For each iteration, the string slicing operation haystack[i : i + m] creates a substring of length m, which takes O(m) time
  • The comparison haystack[i : i + m] == needle also takes O(m) time to compare two strings of length m
  • Therefore, the total time complexity is O((n - m + 1) × m), which simplifies to O((n - m) × m)

The space complexity is O(1) if we consider that string slicing in Python creates a new string object temporarily for comparison, but this temporary space doesn't scale with input size beyond the fixed substring length m. Since m is part of the input rather than additional space that grows, the space complexity is constant.

Pattern Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Loop Boundary Calculation

One of the most frequent mistakes is setting up the wrong loop boundary, which can lead to either missing valid positions or causing index out of bounds errors.

Incorrect implementations:

# WRONG: Goes too far and causes index out of bounds
for i in range(n - m + 2):  # Should be n - m + 1
    if haystack[i:i + m] == needle:
        return i

# WRONG: Stops too early and misses valid positions
for i in range(n - m):  # Missing the +1
    if haystack[i:i + m] == needle:
        return i

Why this happens: The correct formula n - m + 1 represents the number of valid starting positions. For example, with haystack = "hello" (n=5) and needle = "lo" (m=2), we need to check positions 0, 1, 2, and 3 (which is 4 positions total = 5 - 2 + 1).

Solution: Always use range(n - m + 1) or carefully validate your boundary conditions with edge cases.

2. Not Handling Edge Cases

Failing to handle special cases like empty strings or when needle is longer than haystack.

Problematic code:

def strStr(haystack: str, needle: str) -> int:
    n = len(haystack)
    m = len(needle)
    # Missing edge case handling
    for i in range(n - m + 1):  # This could be negative!
        if haystack[i:i + m] == needle:
            return i
    return -1

Issues that arise:

  • If needle is empty, should return 0 (by convention)
  • If needle is longer than haystack, n - m + 1 becomes negative, causing unexpected behavior

Solution: Add explicit edge case checks:

def strStr(haystack: str, needle: str) -> int:
    if not needle:  # Empty needle
        return 0
    if len(needle) > len(haystack):  # Needle longer than haystack
        return -1
  
    n = len(haystack)
    m = len(needle)
    for i in range(n - m + 1):
        if haystack[i:i + m] == needle:
            return i
    return -1

3. Inefficient String Comparison

Using character-by-character comparison in a nested loop instead of Python's built-in string comparison.

Inefficient approach:

def strStr(haystack: str, needle: str) -> int:
    n = len(haystack)
    m = len(needle)
  
    for i in range(n - m + 1):
        match = True
        for j in range(m):  # Unnecessary nested loop
            if haystack[i + j] != needle[j]:
                match = False
                break
        if match:
            return i
    return -1

Why it's problematic: While this works, it's more verbose and potentially less efficient than using Python's optimized string slicing and comparison. Python's built-in string operations are implemented in C and are highly optimized.

Solution: Use Python's string slicing as shown in the original solution:

if haystack[i:i + m] == needle:
    return i

4. Using Built-in Methods Incorrectly

While Python provides str.find() and str.index() methods, using them incorrectly or not understanding their behavior can cause issues.

Misunderstanding built-in methods:

def strStr(haystack: str, needle: str) -> int:
    # This works but defeats the purpose of the exercise
    return haystack.find(needle)
  
    # Or using index() which raises ValueError
    try:
        return haystack.index(needle)
    except ValueError:
        return -1

Issue: While these work, they don't demonstrate understanding of the underlying algorithm, which is often the point of the exercise in interviews or learning contexts.

Solution: Implement the logic manually to show algorithmic thinking, but know that haystack.find(needle) is the most Pythonic solution for production code.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".

What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More