Deepa Pandey

Word Break Problem

Problem Statement

In the word break problem, you will be given a string, say s, and a dictionary of strings say wordDict. The wordDict will contain several strings.

Now, our job is to determine if we can break or segment the string using the words in the dictionary. If we can do that, then we return a True, otherwise, we return a False. We will better understand this through examples ahead.

Must Note: You must note that, while we are breaking the string, we can reuse the same word in the dictionary multiple times. That is, picking the same word multiple times from the dictionary, for forming the string S is allowed.

Task: Break the words in the string S in such a way that, the segmented words are present in the dictionary. Finally, return True, if it is possible to break the word in that manner. Otherwise, return a False.

Input Format: You will be given the string S. And a dictionary of strings wordDict.

Example

Let us look into some examples to understand the problem better.

Example 1:

Input:

s = "storybook"
wordDict = ["story","book"]

Output:

True

Explanation: The above input is “storybook”. It can be split into “story” and ” book”. So, since it can be split, we return a True as output.

Example 2:

Input:

s = "monkeyandonkey"
wordDict = ["monkey","donkey","and"]

Output:

False

Explanation: The above input is “monkeyandonkey”. It can be split into “monkey”, “and” and “onkey”. Or, say “monkey”, ” an” and “donkey” (there can be several other splits made). However, please note that it cannot be formed from the words given in the dictionary. For example, if we combine them, it will result in “monkeyanddonkey” instead of “monkeyandonkey”. Please notice the extra ‘d’ here. So, a false is returned.

Constraints

The constraints for the problem are given below:-

Constraints:

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Intuition for the problem

Solving word break is something like taking the word pieces from the word dictionary and trying to construct our target word.

Have you ever played a puzzle game? In that case, you have already solved wordbreak.

intution-for-the-problem

You can think of the pieces of the puzzle as the words in the dictionary, and the final puzzle as the complete word.

To solve the word game here, we can do two things:

  1. Start with the valid word (say “storybook”) and try to break it down into the words in the dictionary. This approach is also popularly known as the Top-down Approach.
  2. The second approach you can follow is, to start from an empty word, and keep on building the words till you find that yes, that word exists in the dictionary. This approach is also popularly known as the Bottom Up Approach.

If you can break them down, then you can confirm that, yes word-break is possible and return a True. Otherwise, return a False, because word-break is not possible.

Approach 1: Brute Force – Recursion & Backtracking

Let us begin the analysis of the problem with the brute force approach first. So, as you can see, this problem is majorly about picking or not picking every substring of the word given to us, in the string s and verifying whether that substring exists in the wordDict dictionary or not. This check we will have to perform till the end of the string.

By the above analysis, you might have understood that this can be considered a recursion and backtracking problem. Why? Because you will pick up every possible substring and at each step recursively check whether that substring is a valid word break or not.

But why recursion? Simply because, at every step, we are going to repeatedly ask the same question again and again whether it is a valid wordbreak or not, on a smaller and smaller version of the substring.

Algorithm:

  • Declare a recursive function that takes the string, wordDict and the starting position of the string as input.
  • The base case will be to check whether our whole string is exhausted, in that case, we can return a true. Because, in that case, we can conclude that the whole string is present in the dictionary.
  • Then, to find out the solution, we check every possible prefix of the string in the dictionary of words.
  • If the substring is found in the worddict, we recursively call our function for the remaining portion of our string.
  • If the substring is not found and all the substrings are being checked, then we simply return a False and exit from our function.

Implementation in C++, Java and Python

C++ Implementation

Below given is the C++ code for the above brute force approach.

Code:

class WordBreak {
public:
    
    // function that will perform word break
    // and return True or False
    bool wordBreak(string s, vector<string>& wordDict) {
        
        // set containing the words 
        // from the worddict dictionary
        // given as input
        set<string> wordSet(wordDict.begin(), wordDict.end());
        
        // calling the recursive function
        //  that will perform wordbreak 
        return wordBreakRecursion(s, wordSet, 0);
    }

    // recursive function to perform wordbreak
    // and return True or False
    bool wordBreakRecursion(string& s, set<string>& wordSet, int start) {
        
        //base case to check whether
        // the whole string is exhausted,
        // in that case return a `true`
        if (start == s.length()) {
            return true;
        }
        
        // check every possible prefix
        // of the string in the 
        // dictionary of words.
        for (int end = start + 1; end <= s.length(); end++) {
            
           // If substring is found in the `worddict`, 
           // recursively call the function
           // for the remaining portion
           // of the string.
            if (wordSet.find(s.substr(start, end - start)) != wordSet.end() and
                wordBreakRecursion(s, wordSet, end)) {
                return true;
            }
        }
        
        // If the substring is not found and
        // all the substrings has being checked
        // return a `False`
        return false;
    }
};

Python Implementation

Below given is the Python code for the above brute force approach.

Code:

class WordBreak:
    
    def wordBreak(self, s, wordDict):
        
        # function that will perform word break
        # and return True or False
        def wordBreakRecursion(str, wordDict, start):
        
            # base case to check whether
            # the whole string is exhausted,
            # in that case return a `true`
            if start == len(s):
                return True
            
            # check every possible prefix
            # of the string in the 
            # dictionary of words.
            for end in range(start + 1, len(s) + 1):
                
            # If substring is found in the `worddict`, 
           # recursively call the function
           # for the remaining portion
           # of the string.
                if s[start:end] in wordDict and wordBreakRecursion(s, wordDict, end):
                    return True
            
        # If the substring is not found and
        # all the substrings has being checked
        # return a `False`
            return False

        return wordBreakRecursion(s, set(wordDict), 0)

#driver code

# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)

Output:

True

Java Implementation

Below given is the Java code for the above brute force approach.

Code:

public class WordBreak {
    
    // function that will call wordBreakRecursion
    // and return True or False on its basis
    public boolean wordBreak(String s, List<String> wordDict) {
        return wordBreakRecursion(s, new HashSet<>(wordDict), 0);
    }
    
    // recursive function to perform wordbreak
    // and return True or False
    private boolean wordBreakRecursion(String s, Set<String> wordDict, int start) {
        
        //base case to check whether
        // the whole string is exhausted,
        // in that case return a `true`
        if (start == s.length()) {
            return true;
        }
        
        // check every possible prefix
        // of the string in the 
        // dictionary of words.
        for (int end = start + 1; end <= s.length(); end++) {
            
            // If substring is found in the `worddict`, 
           // recursively call the function
           // for the remaining portion
           // of the string and finally 
           // return a True if all recursive 
           // calls are returning True
            if (wordDict.contains(s.substring(start, end)) && wordBreakRecursion(s, wordDict, end)) {
                return true;
            }
        }
        
        // If the substring is not found and
        // all the substrings has being checked
        // return a `False`
        return false;
    }
}

Time Complexity Analysis

As you can see, our above-implemented solution is a recursive solution. Also, for a given word of length $N$ (suppose), there are $N+1$ ways to split it into two parts. So, for every step, we will have two choices — either to split the string or not split the string. Also, in the worst case, we might have to check all the possibilities, thereby applying split and not-split for every substring.

So, in simple words, the time taken in the worst case will be $O(2^{\text{n+1}})$, which is equivalent to $O(2^n)$.

Space Complexity Analysis

As you might already know, the space taken by a recursive solution is equivalent to the recursive stack space, here also we have a similar situation. In this problem, the recursive stack can grow up to a maximum of depth $N$, where $N$ is the length of our word. Hence, the worst-case space taken will be equivalent to $O(N)$.

Approach 2: Recursion with Memoization

Recursion is cool because it breaks our most complicated problems into very simple subproblems. But, for a few of the recursive problems, there is a cost incurred with it — multiple calls to the same sub-problem. Let us understand it further with an example.
Did you notice one thing in our previous approach? We had multiple calls to the same sub-problem, although it was already solved in some recursion steps.

recursion-with-memoization

For example, in the above image, suppose we have a string s = "abcd". Now, in the recursive diagram you can see, how we have broken the problem into several subproblems. But, one thing to notice here is that there are multiple calls made to the same function, which were already computed. For example, WB("cd") and WB("d"), are called multiple times in our code, although we have already computed them. As the size of our input will grow, the number of repeating sub-problems will also grow.

So, to avoid the above issue, we can use the memoization technique. Basically, in the memorization method, an array is used to store the results of the already computed sub-problems. Now, when the function is again called for any particular string, the value is fetched and returned from that array (only if its value has already been evaluated).

With memoization, many redundant subproblems are avoided and the recursion tree is pruned, thus it reduces the time complexity by a large factor.

Let us look into the code below.

Implementation in C++, Java and Python

C++ Implementation

Below given is the C++ code for the above recursion with the memoization approach.

Code:

class WordBreak {
public:
    bool wordBreak(string s, vector<string>& wordDict) {
        set<string> wordDictionary(wordDict.begin(), wordDict.end());
        // In the memo table, -1 stands 
        // for the state not yet reached,
        // 0 for false and 1 for true
        vector<int> memo(s.length(), -1);
        return memoizeWordBreak(s, wordDictionary, 0, memo);
    }

    // recursive function to perform wordbreak
    // and return True or False
    bool memoizeWordBreak(string& s, set<string>& wordDictionary, int start, vector<int>& memo) {
        
        //base case to check whether
        // the whole string is exhausted,
        // in that case return a `true`
        if (start == s.length()) {
            return true;
        }
        
        // if out memo table contains any value 
        // apart from -1 then that means, 
        // that subproblem is already evaluated, 
        // so simply return its value
        if (memo[start] != -1) {
            return memo[start];
        }
        
        // check every possible prefix
        // of the string in the 
        // dictionary of words.
        for (int end = start + 1; end <= s.length(); end++) {
            
            // recursively call the function
           // for the remaining portion
           // of the string and finally 
           // return a True if all recursive 
           // calls are returning True
            if (wordDictionary.find(s.substr(start, end - start)) != wordDictionary.end() and
                memoizeWordBreak(s, wordDictionary, end, memo)) {
                return memo[start] = true;
            }
        }
        
        // finally check if the memo[start],
        //  which is the end of the string,
        // is true or false.
        // Return whatever it is as the result
        return memo[start] = false;
    }
};

Python Implementation

Below given is the Python code for the above recursion with the memoization approach.

Code:

class WordDictionary:
    # function that will call MemoizeWordBreak
    # and return True or False on its basis
    def word_break(self, s, wordDict):
        
        # @lru_cache is a decorator that
        # caches the values of the subproblems
        # already computed and prevents 
        # calling them again
        @lru_cache
        
        # recursive problem to compute word_break
        def MemoizeWordBreak(s, word_dict, start):
            
        # base case to check whether
        # the whole string is exhausted,
        # in that case return a `true`
            if start == len(s):
                return True
            
        # check every possible prefix
        # of the string in the 
        # dictionary of words.
            for end in range(start + 1, len(s) + 1):
                
                # If substring is found in the `worddict`, 
           # recursively call the function
           # for the remaining portion
           # of the string
                if s[start:end] in word_dict and MemoizeWordBreak(s, word_dict, end):
                    return True
            
        # If the substring is not found and
        # all the substrings has being checked
        # return a `False`
            return False

        # calling MemoizeWordBreak
        return MemoizeWordBreak(s, frozenset(wordDict), 0)
        
#driver code

# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)

Output:

True

Java Implementation

Below given is the Java code for the above recursion with the memoization approach.

Code:

public class WordBreak {
    
    // function that will call wordBreakMemo
    // and return True or False on its basis
    public boolean word_Break(String s, List<String> wordDict) {
        return wordBreakMemo(s, new HashSet<>(wordDict), 0, new Boolean[s.length()]);
    }

    // recursive function to perform wordbreak
    // and return True or False
    private boolean wordBreakMemo(String s, Set<String> wordDict, int start, Boolean[] memo) {
        
        //base case to check whether
        // the whole string is exhausted,
        // in that case return a `true`
        if (start == s.length()) {
            return true;
        }
        if (memo[start] != null) {
            return memo[start];
        }
        
        // check every possible prefix
        // of the string in the 
        // dictionary of words.
        for (int end = start + 1; end <= s.length(); end++) {
            // If substring is found in the `worddict`, 
           // recursively call the function
           // for the remaining portion
           // of the string
            if (wordDict.contains(s.substring(start, end)) && wordBreakMemo(s, wordDict, end, memo)) {
                return memo[start] = true;
            }
        }
        return memo[start] = false;
    }
}

Time Complexity Analysis

In the above code, we will not have to compute the repetitive sub-problems, leading to a drastic improvement in the time complexity. Firstly the size of the recursion tree can grow up to $N^2$, considering $N$ to be the size of the string. At each step of recursion, we will also be computing the substring, which will take another $O(N)$. Hence the worst-case time complexity would be equal to $O(N^3)$.

Space Complexity Analysis

Since you may already be aware, that the space occupied by a recursive solution is equal to the space occupied by the recursive stack, the situation here is also comparable. The recursive stack in this problem can expand to a maximum depth of $N$, where $N$ is the length of our word. The worst-case space taken will therefore be $O(N)$.

Approach 3 – Using Breadth First Search

The next approach we will see is using the breadth-first search to solve this problem.

Here we consider the string to be a tree, with each node representing the prefix up to the end index. We can visualize the string as a tree, where every node will represent the prefix up to the end index. Let us look into the algorithm to solve using this approach:

Algorithm:

  • Firstly, we will create a set of the wordDict to store all the unique words of the wordDict passed to us.
  • Also, we will take two variables, q (which is a deque to store the end index of the string which is present in the wordDict) and visited (which will store the starting index of the string which is present in the wordDict).
  • We will run a loop through the q, and at each step, we will do the following:
    • Pop the left element from the q into the start variable.
    • If start is already present in the visited list, we will not do anything, and skip that iteration.
    • Otherwise, we run a for-loop named end (say) through start till the length of the string i.e. len(s).
    • Inside the loop we check if the substring starting from start till end is present in the wordDict or not.
    • If the substring is present, then we add the end index of the substring into our q data structure.
    • Else we skip and do nothing
    • At any point if the end variable becomes equal to the length of the string, i.e. len(s), then return a True and come out of the iterations.
  • We will also add the start to our visited data structure every time we come out of the loop.

Let us not look at the implementation below to understand our code better.

Implementation in C++, Java and Python

C++ Implementation

Below given is the C++ code for the above breadth-first search approach.

Code:

class WordBreak {
public:
    // function to compute word break
    bool word_Break(string s, vector<string>& wordDict) {

        // storing word_set in a set
        set<string> word_set(wordDict.begin(), wordDict.end());


        // q is a queue to store the end 
        // index of the string which is
        // present in the `wordDict`
        queue<int> q;

        // visited stores the starting index
        //  of the string which is 
        //  present in the `wordDict`
        vector<bool> visited(s.length(), false);
        q.push(0);

        // iterate till q becomes empty
        while (!q.empty()) {
            int start = q.front();
            q.pop();

            // If `start` is already present 
            // in the `visited` list, do not do
            // anything, and skip that iteration.
            if (visited[start]) {
                continue;
            }

            // run a for-loop named `end` 
            // through `start` till the
            // length of the string
            for (int end = start + 1; end <= s.length(); end++) {

                 // check if the substring starting from `start`
                 //  till `end` is present in 
                 //  the `wordDict` or not.
                if (word_set.find(s.substr(start, end - start)) !=
                    word_set.end()) {
                    q.push(end);


                    // if the `end` variable becomes equal
                    // to the length of the string,
                    //  then return `True`
                    if (end == s.length()) {
                        return true;
                    }
                }
            }
            visited[start] = true;
        }
        return false;
    }
};

Python Implementation

Below given is the Python code for the above breadth-first search approach.

Code:

class WordBreak:

    // function to compute word break
    def word_Break(self, s: str, wordDict: List[str]) -> bool:

        # store the given wordDict in a set word_dictionary
        word_dictionary = set(wordDict)

        # q is a dequeue to store the end 
        # index of the string which is
        # present in the `wordDict`
        q = deque()

        # visited stores the starting index
        #  of the string which is 
        #  present in the `wordDict`
        visited = set()

        # append starting index of s to q
        q.append(0)

        # iterate till any element is present in q
        while q:
            # pop from the left of q and store 
            # the value in start
            start = q.popleft()

            # If `start` is already present 
            # in the `visited` list, do not do
            # anything, and skip that iteration.
            if start in visited:
                continue

            # run a for-loop named `end` 
            # through `start` till the
            # length of the string
            for end in range(start + 1, len(s) + 1):

                # check if the substring starting from `start`
                 #  till `end` is present in 
                 #  the `wordDict` or not.
                if s[start:end] in word_dictionary:
                    q.append(end)

                    # if the `end` variable becomes equal
                    # to the length of the string,
                    #  then return `Tru#
                    if end == len(s):
                        return True
            visited.add(start)
        return False

#driver code

# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)

Output:

True

Java Implementation

Below given is the Java code for the above breadth-first search approach.

Code:

public class WordBreak {
    // function to compute word break
    public boolean word_Break(String s, List<String> wordDict) {

        // storing word_set in a set
        Set<String> wordDictSet = new HashSet<>(wordDict);

        // q is a queue to store the end 
        // index of the string which is
        // present in the `wordDict`
        Queue<Integer> queue = new LinkedList<>();

        // visited stores the starting index
        //  of the string which is 
        //  present in the `wordDict`
        boolean[] visited = new boolean[s.length()];
        queue.add(0);

        // iterate till q becomes empty
        while (!queue.isEmpty()) {
            int start = queue.remove();

            // If `start` is already present 
            // in the `visited` list, do not do
            // anything, and skip that iteration.
            if (visited[start]) {
                continue;
            }

            // run a for-loop named `end` 
            // through `start` till the
            // length of the string
            for (int end = start + 1; end <= s.length(); end++) {

                // check if the substring starting from `start`
                 // till `end` is present in 
                 // the `wordDict` or not.
                if (wordDictSet.contains(s.substring(start, end))) {
                    queue.add(end);

                    // if the `end` variable becomes equal
                    // to the length of the string,
                    //  then return `True`
                    if (end == s.length()) {
                        return true;
                    }
                }
            }
            visited[start] = true;
        }
        return false;
    }
}

Time Complexity Analysis

Let us suppose, the length of the string is $N$.

If we observe the code, we have basically three iterations going on —

  • First is the while loop which iterates through the q data structure. In the worst case, it can run till $N$ times.
  • Second if the for-loop which runs from start till the length of the string. It can also run to approximately $N$ times in the worst case.
  • Third is the substring we take out at every step. It will take another $O(N)$ time

So, overall we have the worst case time complexity is $O(N^3)$

Space Complexity Analysis

We are using a queue data structure, visited list, and another data structure to store the dictionary into a set. So, we can consider that the overall worst-case space taken will be $O(N)$.

NOTE:
The memoization strategy we learned earlier was more like DFS since after finding a match, we search all the way to the end of the string before returning and repeating the process for any further matches with the same start index. But in BFS, it is important to find all possible matches for the start index (i.e., the same level) before moving on to examine neighbouring indexes and so forth. But the concept is the same. Also, there is no such improvement in terms of time or space here in the BFS approach.

Approach 4: Using Dynamic Programming

We have already used the memoization approach earlier in the article where we had a memo table to cache all our subproblems that were already solved. However, in that approach, the problem was that we were using the recursion stack space. So, let us now optimise that approach and cut down that extra recursive stack space. So, in this approach, we will be using dynamic programming to solve our extra stack space. Let us look at the algorithm.

Algorithm:

  • First we create a dp array of size $n+1$, considering $”n”$ to be the length of the given string.
  • We initialize the dp array with false. Only, the first index of the dp array is kept as True, since the $null$ string will always be present in the dictionary.
  • After that, we also take two index pointers i and j.
  • We run a for-loop through i which starts from index 1, and go through the length of the string, i.e. len(s). Inside the loop, we run another loop for j which runs till i.
    • Inside the j loop, we check if the dp[j] is True
    • And if it is True, then we also check if the substring s[i:j] is present in the dictionary or not.
    • If s[i:j] is present, then we simply mark dp[i] as True and break from the loop
  • Finally, we return the value present at the $nth$ index of the dp array. If it is True, then yes the wordbreak can be formed, otherwise, no the wordbreak cannot be formed.

Implementation in C++, Java and Python

C++ Implementation

Below given is the C++ code for the above dynamic programming approach.

Code:

class WordBreak {
public:

    // function to compute wordbreak
    bool word_break(string s, vector<string>& wordDict) {

        // store the worddict in a set
        set<string> word_dictionary(wordDict.begin(), wordDict.end());

        // create a `dp` array of size s.length()
        vector<bool> dp(s.length() + 1);

       // only the first index of the `dp` array is kept as True, 
       // since the null string will 
       // always be present in the dictionary.
        dp[0] = true;

        for (int i = 1; i <= s.length(); i++) {
            for (int j = 0; j < i; j++) {

                // check if dp[j] is True, and if it is True,
                // also check if the substring `s[i:j]` is
                // present in the dictionary or not.
                if (dp[j] and
                    word_dictionary.find(s.substr(j, i - j)) != word_dictionary.end()) {

                   // If `s[i:j]` is present,
                   // mark `dp[i]` as `True` 
                   // and break from the loop 
                    dp[i] = true;
                    break;
                }
            }
        }

        //  Return the value present at the 
        //  nth index of the `dp` array, either true or false
        return dp[s.length()];
    }
};

Python Implementation

Below given is the Python code for the above dynamic programming approach.

Code:

class WordBreak:
    # function to compute wordbreak
    def word_break(self, s, wordDict):
        word_dictionary = set(wordDict)

        # create a `dp` array of size s.length()
        dp = [False] * (len(s) + 1)

        # only the first index of the `dp` array is kept as True, 
       # since the null string will 
       # always be present in the dictionary.
        dp[0] = True

        for i in range(1, len(s) + 1):
            for j in range(i):

                # check if dp[j] is True, and if it is True,
                # also check if the substring `s[i:j]` is
                # present in the dictionary or not.
                if dp[j] and s[j:i] in word_dictionary:

                     # If `s[i:j]` is present,
                   # mark `dp[i]` as `True` 
                   # and break from the loop 
                    dp[i] = True
                    break

        #  Return the value present at the 
        #  nth index of the `dp` array, either true or false
        return dp[len(s)]

#driver code

# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)

Output:

True

Java Implementation

Below given is the Java code for the above dynamic programming approach.

Code:

public class WordBreak {

    // function to compute wordbreak
    public boolean word_break(String s, List<String> wordDict) {

        // store the worddict in a set
        Set<String> word_dictionary_set = new HashSet<>(wordDict);

        // create a `dp` array of size s.length()
        boolean[] dp = new boolean[s.length() + 1];

        // only the first index of the `dp` array is kept as True, 
       // since the null string will 
       // always be present in the dictionary.
        dp[0] = true;
        for (int i = 1; i <= s.length(); i++) {
            for (int j = 0; j < i; j++) {

                // check if dp[j] is True, and if it is True,
                // also check if the substring `s[i:j]` is
                // present in the dictionary or not.
                if (dp[j] && word_dictionary_set.contains(s.substring(j, i))) {

                    // If `s[i:j]` is present,
                   // mark `dp[i]` as `True` 
                   // and break from the loop 
                    dp[i] = true;
                    break;
                }
            }
        }

        //  Return the value present at the 
        //  nth index of the `dp` array, either true or false
        return dp[s.length()];
    }
}

Time Complexity Analysis

In the above dynamic programming approach, we are iterating through two for-loops which are taking the time of $O(N^2)$, considering $N$ to be the length of the string. Also, at each step, we will be computing the substring of the given string, which will take another $O(N)$.
Hence, the worst-case time complexity for this approach will be $O(N^3)$

Space Complexity Analysis

Since we have used a dp array of size N+1, the worst-case space complexity will be $O(N)$.

Approach 5: Using Trie Data Structure

Firstly, let us understand what is a trie data structure.

Tries are a very unique and practical data structure that is built on a string's prefix. Strings are kept in the data structure Trie, and it looks like a graph. There are nodes and edges in it. At most 26 children can be found in each node, and edges join each parent node to its children. These 26 pointers are just pointers for each of the 26English alphabetic characters. Every edge is kept as a separate edge. It is always a good choice to optimize the search for substrings using a Trie Data Structure.

Explaining the implementation of the Trie data structure is not under the scope of this article. Let us head on to our code for the same.

Algorithm:
Here we will only discuss the steps we have performed in the wordBreak recursive function. Implementation of TrieNode is not under the scope of the article.

  • First find thelength of the string given
  • If the length of the string is 0 then return a True because Null character will always be present in the word dictionary given.
  • Next, run a loop through the length of the string
  • Check if the prefix of the string is present in the Trie or not.
  • If the prefix is present, then again run the recursive function through the remaining length of the string.
  • Finally return a True if all the prefixes of the string s has been checked and they are present in the word dictionary.

Implementation in C++, Java and Python

C++ Implementation

Below given is the C++ code using the Trie data structure.

Code:

#include <iostream>
using namespace std;

struct TrieNode {
    struct TrieNode* children[26];

    // wordEnd is true if the node represents
    // end of a word
    bool wordEnd;
};

// to return a new trie node
struct TrieNode* getNode(void)
{
    struct TrieNode* trie_node = new TrieNode;
    trie_node->wordEnd = false;
    for (int i = 0; i < 26; i++)
        trie_node->children[i] = NULL;
    return trie_node;
}


// inserting into the trieNode
void insert(struct TrieNode* root, string key)
{
    struct TrieNode* tNode = root;

    for (int i = 0; i < key.length(); i++) {
        int index = key[i] - 'a';
        if (!tNode->children[index])
            tNode->children[index] = getNode();

        tNode = tNode->children[index];
    }

    // mark last node as leaf
    tNode->wordEnd = true;
}

// Returns true if key presents in trie, else
// false
bool search(struct TrieNode* root, string key)
{
    struct TrieNode* tNode = root;

    for (int i = 0; i < key.length(); i++) {
        int index = key[i] - 'a';
        if (!tNode->children[index])
            return false;

        tNode = tNode->children[index];
    }

    return (tNode != NULL && tNode->wordEnd);
}

// returns true if string can be broken down
// into the words in the dictionary
bool wordBreak(string str, TrieNode* root)
{
    int size = str.size();

    // Base case
    if (size == 0)
        return true;

    // Try all prefixes of lengths from 1 to size
    for (int i = 1; i <= size; i++) {
        if (search(root, str.substr(0, i))
            && wordBreak(str.substr(i, size - i), root))
            return true;
    }

    // after computing all prefixes and
    // not finding the result, return false
    return false;
}

Python Implementation

Below given is the Python code using the Trie data structure.

Code:

class WordBreak(object):
    def word_break(self, s, wordDict):

        # initializing our TrieNode class
        class TrieNode(object):
            def __init__(self):
                self.children = []
                self.leafNode = False

            # implementing the method for getting a 
            # TrieNode from our implemented TrieNode 
            def getTrieNode(self):
                temp = TrieNode()
                temp.children = []

                # 26 represents the number 
                # of alphabets present 
                # in the English
                for i in range(26):
                    temp.children.append(None)
                temp.leafNode = False
                return temp

            # to search a node into our TrieNode
            def search(self, root, x):
                tNode = root
                for i in x:
                    idx = ord(i)-97
                    if(tNode.children[idx] == None):
                        return False
                    tNode = tNode.children[idx]
                if(tNode and tNode.leafNode):
                    return True

            # to insert a TrieNode into our Trie
            def insertTrieNode(self, root, x):
                x = str(x)
                tNode = root
                for i in x:
                    index = ord(i)-97
                    if(tNode.children[index] == None):
                        # node has to be initialised
                        tNode.children[index] = self.getTrieNode()
                    tNode = tNode.children[index]
                tNode.leafNode = True # marking end of word


        # this function checks if it is possible 
        # to do a wordbreak
        def isPossibleWordBreak(strr, root):
            l = len(strr)
            if(l == 0):
                return True
            for i in range(1, l+1):
                if(root.search(root, strr[:i]) and isPossibleWordBreak(strr[i:], root)):
                    return True
            return False

#driver code

# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)

Output:

True

Java Implementation

Below given is the Java code using the Trie data structure.

Code:

import java.io.*;
import java.util.*;

class WordBreak {

    static class TrieNode {
        TrieNode children[];

        // isEnd is true if the node
        // represents the end of the word
        boolean isEnd;


        public TrieNode()
        {
            children = new TrieNode[26]; // 26 is the number of alphabets
            for (int i = 0; i < 26; i++)
                children[i] = null;

            isEnd = false;
        }
    }

    // If the key is the prefix of trie, mark
    // it as the leaf node
    // otherwise insert the key into the trie
    static void insert(TrieNode root, String x)
    {
        TrieNode tNode = root;

        for (int i = 0; i < x.length(); i++) {
            int idx = x.charAt(i) - 'a';
            if (tNode.children[idx] == null)
                tNode.children[idx] = new TrieNode();

            tNode = tNode.children[idx];
        }

        // Mark last node as leaf
        tNode.isEnd = true;
    }

    // If key is present, it returns a True
    // Otherwise returs a false
    static boolean search(TrieNode root, String x)
    {
        TrieNode tNode = root;

        for (int i = 0; i < x.length(); i++) {
            int idx = x.charAt(i) - 'a';
            if (tNode.children[idx] == null)
                return false;

            tNode = tNode.children[idx];
        }
        return (tNode != null && tNode.isEnd);
    }

    // Checks whether the words can be broken
    // into the words of the dictionary
    // Returns True if possible
    // Otherwise a False is returned
    static boolean wordBreak(String s, TrieNode root)
    {
        int l = s.length();

        if (l == 0)
            return true;


        for (int i = 1; i <= l; i++) {
            if (search(root, s.substring(0, i))
                && wordBreak(s.substring(i, l), root))
                return true;
        }


        return false;
    }

Time Complexity Analysis

The Insert and search operation in Trie usually costs $O(keyLength)$. The time complexity for the above operation will be major $O(N^2)$ because we will be performing the search operation inside the for-loop. Please note, N is the length of the string i.e. len(s).

Space Complexity Analysis

The space complexity for the above approach will be $O(N + M)$, where N is the length of the string i.e. len(s) and M is the number of words in the wordDict.

Conclusion

In this article, we have learned about the word break problem. Let us now summarize, what we have seen throughout:

  • In the word break problem, you will be given a string, say “s”, and a dictionary of strings say “wordDict”. You have to return true if s can be segmented into a space-separated sequence of one or more dictionary words.
  • For the brute force approach, we can choose the recursion and backtracking to solve the problem. However, the time complexity will become exponential in this case.
  • For an improvement for the above approach, we can memoize our code. That is, we will store the results of the already computed subproblems and improve the time complexity to a great extent.
  • Another similar approach will be by using the breadth-first search. There will be no improvement over the optimisation but it is another good approach.
  • Next, we moved on to the Dynamic Programming approach, where we omitted the extra stack space taken by the recursive solutions as well.
  • Finally, we learned about another approach to solve the problem – using the trie data structure. It improved the time and space complexity of the solution.

Author