Problem Statement
In the word break problem, you will be given a string, say s, and a dictionary of strings say wordDict. The wordDict will contain several strings.
Now, our job is to determine if we can break or segment the string using the words in the dictionary. If we can do that, then we return a True, otherwise, we return a False. We will better understand this through examples ahead.
Must Note: You must note that, while we are breaking the string, we can reuse the same word in the dictionary multiple times. That is, picking the same word multiple times from the dictionary, for forming the string S is allowed.
Task: Break the words in the string S in such a way that, the segmented words are present in the dictionary. Finally, return True, if it is possible to break the word in that manner. Otherwise, return a False.
Input Format: You will be given the string S. And a dictionary of strings wordDict.
Example
Let us look into some examples to understand the problem better.
Example 1:
Input:
s = "storybook"
wordDict = ["story","book"]
Output:
True
Explanation: The above input is “storybook”. It can be split into “story” and ” book”. So, since it can be split, we return a True as output.
Example 2:
Input:
s = "monkeyandonkey"
wordDict = ["monkey","donkey","and"]
Output:
False
Explanation: The above input is “monkeyandonkey”. It can be split into “monkey”, “and” and “onkey”. Or, say “monkey”, ” an” and “donkey” (there can be several other splits made). However, please note that it cannot be formed from the words given in the dictionary. For example, if we combine them, it will result in “monkeyanddonkey” instead of “monkeyandonkey”. Please notice the extra ‘d’ here. So, a false is returned.
Constraints
The constraints for the problem are given below:-
Constraints:
- 1 <= s.length <= 300
- 1 <= wordDict.length <= 1000
- 1 <= wordDict[i].length <= 20
- s and wordDict[i] consist of only lowercase English letters.
- All the strings of wordDict are unique.
Intuition for the problem
Solving word break is something like taking the word pieces from the word dictionary and trying to construct our target word.
Have you ever played a puzzle game? In that case, you have already solved wordbreak.
You can think of the pieces of the puzzle as the words in the dictionary, and the final puzzle as the complete word.
To solve the word game here, we can do two things:
- Start with the valid word (say “storybook”) and try to break it down into the words in the dictionary. This approach is also popularly known as the Top-down Approach.
- The second approach you can follow is, to start from an empty word, and keep on building the words till you find that yes, that word exists in the dictionary. This approach is also popularly known as the Bottom Up Approach.
If you can break them down, then you can confirm that, yes word-break is possible and return a True. Otherwise, return a False, because word-break is not possible.
Approach 1: Brute Force – Recursion & Backtracking
Let us begin the analysis of the problem with the brute force approach first. So, as you can see, this problem is majorly about picking or not picking every substring of the word given to us, in the string s and verifying whether that substring exists in the wordDict dictionary or not. This check we will have to perform till the end of the string.
By the above analysis, you might have understood that this can be considered a recursion and backtracking problem. Why? Because you will pick up every possible substring and at each step recursively check whether that substring is a valid word break or not.
But why recursion? Simply because, at every step, we are going to repeatedly ask the same question again and again whether it is a valid wordbreak or not, on a smaller and smaller version of the substring.
Algorithm:
- Declare a recursive function that takes the string, wordDict and the starting position of the string as input.
- The base case will be to check whether our whole string is exhausted, in that case, we can return a true. Because, in that case, we can conclude that the whole string is present in the dictionary.
- Then, to find out the solution, we check every possible prefix of the string in the dictionary of words.
- If the substring is found in the worddict, we recursively call our function for the remaining portion of our string.
- If the substring is not found and all the substrings are being checked, then we simply return a False and exit from our function.
Implementation in C++, Java and Python
C++ Implementation
Below given is the C++ code for the above brute force approach.
Code:
class WordBreak {
public:
// function that will perform word break
// and return True or False
bool wordBreak(string s, vector<string>& wordDict) {
// set containing the words
// from the worddict dictionary
// given as input
set<string> wordSet(wordDict.begin(), wordDict.end());
// calling the recursive function
// that will perform wordbreak
return wordBreakRecursion(s, wordSet, 0);
}
// recursive function to perform wordbreak
// and return True or False
bool wordBreakRecursion(string& s, set<string>& wordSet, int start) {
//base case to check whether
// the whole string is exhausted,
// in that case return a `true`
if (start == s.length()) {
return true;
}
// check every possible prefix
// of the string in the
// dictionary of words.
for (int end = start + 1; end <= s.length(); end++) {
// If substring is found in the `worddict`,
// recursively call the function
// for the remaining portion
// of the string.
if (wordSet.find(s.substr(start, end - start)) != wordSet.end() and
wordBreakRecursion(s, wordSet, end)) {
return true;
}
}
// If the substring is not found and
// all the substrings has being checked
// return a `False`
return false;
}
};
Python Implementation
Below given is the Python code for the above brute force approach.
Code:
class WordBreak:
def wordBreak(self, s, wordDict):
# function that will perform word break
# and return True or False
def wordBreakRecursion(str, wordDict, start):
# base case to check whether
# the whole string is exhausted,
# in that case return a `true`
if start == len(s):
return True
# check every possible prefix
# of the string in the
# dictionary of words.
for end in range(start + 1, len(s) + 1):
# If substring is found in the `worddict`,
# recursively call the function
# for the remaining portion
# of the string.
if s[start:end] in wordDict and wordBreakRecursion(s, wordDict, end):
return True
# If the substring is not found and
# all the substrings has being checked
# return a `False`
return False
return wordBreakRecursion(s, set(wordDict), 0)
#driver code
# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)
Output:
True
Java Implementation
Below given is the Java code for the above brute force approach.
Code:
public class WordBreak {
// function that will call wordBreakRecursion
// and return True or False on its basis
public boolean wordBreak(String s, List<String> wordDict) {
return wordBreakRecursion(s, new HashSet<>(wordDict), 0);
}
// recursive function to perform wordbreak
// and return True or False
private boolean wordBreakRecursion(String s, Set<String> wordDict, int start) {
//base case to check whether
// the whole string is exhausted,
// in that case return a `true`
if (start == s.length()) {
return true;
}
// check every possible prefix
// of the string in the
// dictionary of words.
for (int end = start + 1; end <= s.length(); end++) {
// If substring is found in the `worddict`,
// recursively call the function
// for the remaining portion
// of the string and finally
// return a True if all recursive
// calls are returning True
if (wordDict.contains(s.substring(start, end)) && wordBreakRecursion(s, wordDict, end)) {
return true;
}
}
// If the substring is not found and
// all the substrings has being checked
// return a `False`
return false;
}
}
Time Complexity Analysis
As you can see, our above-implemented solution is a recursive solution. Also, for a given word of length $N$ (suppose), there are $N+1$ ways to split it into two parts. So, for every step, we will have two choices — either to split the string or not split the string. Also, in the worst case, we might have to check all the possibilities, thereby applying split and not-split for every substring.
So, in simple words, the time taken in the worst case will be $O(2^{\text{n+1}})$, which is equivalent to $O(2^n)$.
Space Complexity Analysis
As you might already know, the space taken by a recursive solution is equivalent to the recursive stack space, here also we have a similar situation. In this problem, the recursive stack can grow up to a maximum of depth $N$, where $N$ is the length of our word. Hence, the worst-case space taken will be equivalent to $O(N)$.
Approach 2: Recursion with Memoization
Recursion is cool because it breaks our most complicated problems into very simple subproblems. But, for a few of the recursive problems, there is a cost incurred with it — multiple calls to the same sub-problem. Let us understand it further with an example.
Did you notice one thing in our previous approach? We had multiple calls to the same sub-problem, although it was already solved in some recursion steps.
For example, in the above image, suppose we have a string s = "abcd"
. Now, in the recursive diagram you can see, how we have broken the problem into several subproblems
. But, one thing to notice here is that there are multiple calls made to the same function, which were already computed. For example, WB("cd")
and WB("d")
, are called multiple times in our code, although we have already computed them. As the size of our input will grow
, the number of repeating sub-problems will also grow
.
So, to avoid the above issue, we can use the memoization technique
. Basically, in the memorization method, an array is used to store the results of the already computed sub-problems. Now, when the function is again called for any particular string, the value is fetched and returned from that array (only if its value has already been evaluated).
With memoization, many redundant subproblems are avoided and the recursion tree is pruned, thus it reduces the time complexity by a large factor.
Let us look into the code below.
Implementation in C++, Java and Python
C++ Implementation
Below given is the C++ code for the above recursion with the memoization approach.
Code:
class WordBreak {
public:
bool wordBreak(string s, vector<string>& wordDict) {
set<string> wordDictionary(wordDict.begin(), wordDict.end());
// In the memo table, -1 stands
// for the state not yet reached,
// 0 for false and 1 for true
vector<int> memo(s.length(), -1);
return memoizeWordBreak(s, wordDictionary, 0, memo);
}
// recursive function to perform wordbreak
// and return True or False
bool memoizeWordBreak(string& s, set<string>& wordDictionary, int start, vector<int>& memo) {
//base case to check whether
// the whole string is exhausted,
// in that case return a `true`
if (start == s.length()) {
return true;
}
// if out memo table contains any value
// apart from -1 then that means,
// that subproblem is already evaluated,
// so simply return its value
if (memo[start] != -1) {
return memo[start];
}
// check every possible prefix
// of the string in the
// dictionary of words.
for (int end = start + 1; end <= s.length(); end++) {
// recursively call the function
// for the remaining portion
// of the string and finally
// return a True if all recursive
// calls are returning True
if (wordDictionary.find(s.substr(start, end - start)) != wordDictionary.end() and
memoizeWordBreak(s, wordDictionary, end, memo)) {
return memo[start] = true;
}
}
// finally check if the memo[start],
// which is the end of the string,
// is true or false.
// Return whatever it is as the result
return memo[start] = false;
}
};
Python Implementation
Below given is the Python code for the above recursion with the memoization approach.
Code:
class WordDictionary:
# function that will call MemoizeWordBreak
# and return True or False on its basis
def word_break(self, s, wordDict):
# @lru_cache is a decorator that
# caches the values of the subproblems
# already computed and prevents
# calling them again
@lru_cache
# recursive problem to compute word_break
def MemoizeWordBreak(s, word_dict, start):
# base case to check whether
# the whole string is exhausted,
# in that case return a `true`
if start == len(s):
return True
# check every possible prefix
# of the string in the
# dictionary of words.
for end in range(start + 1, len(s) + 1):
# If substring is found in the `worddict`,
# recursively call the function
# for the remaining portion
# of the string
if s[start:end] in word_dict and MemoizeWordBreak(s, word_dict, end):
return True
# If the substring is not found and
# all the substrings has being checked
# return a `False`
return False
# calling MemoizeWordBreak
return MemoizeWordBreak(s, frozenset(wordDict), 0)
#driver code
# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)
Output:
True
Java Implementation
Below given is the Java code for the above recursion with the memoization approach.
Code:
public class WordBreak {
// function that will call wordBreakMemo
// and return True or False on its basis
public boolean word_Break(String s, List<String> wordDict) {
return wordBreakMemo(s, new HashSet<>(wordDict), 0, new Boolean[s.length()]);
}
// recursive function to perform wordbreak
// and return True or False
private boolean wordBreakMemo(String s, Set<String> wordDict, int start, Boolean[] memo) {
//base case to check whether
// the whole string is exhausted,
// in that case return a `true`
if (start == s.length()) {
return true;
}
if (memo[start] != null) {
return memo[start];
}
// check every possible prefix
// of the string in the
// dictionary of words.
for (int end = start + 1; end <= s.length(); end++) {
// If substring is found in the `worddict`,
// recursively call the function
// for the remaining portion
// of the string
if (wordDict.contains(s.substring(start, end)) && wordBreakMemo(s, wordDict, end, memo)) {
return memo[start] = true;
}
}
return memo[start] = false;
}
}
Time Complexity Analysis
In the above code, we will not have to compute the repetitive sub-problems, leading to a drastic improvement in the time complexity. Firstly the size of the recursion tree can grow up to $N^2$, considering $N$ to be the size of the string. At each step of recursion, we will also be computing the substring, which will take another $O(N)$. Hence the worst-case time complexity would be equal to $O(N^3)$.
Space Complexity Analysis
Since you may already be aware, that the space occupied by a recursive solution is equal to the space occupied by the recursive stack, the situation here is also comparable. The recursive stack in this problem can expand to a maximum depth of $N$, where $N$ is the length of our word. The worst-case space taken will therefore be $O(N)$.
Approach 3 – Using Breadth First Search
The next approach we will see is using the breadth-first search to solve this problem.
Here we consider the string to be a tree, with each node representing the prefix up to the end index. We can visualize the string as a tree, where every node will represent the prefix up to the end index. Let us look into the algorithm to solve using this approach:
Algorithm:
- Firstly, we will create a set of the
wordDict
to store all the unique words of thewordDict
passed to us. - Also, we will take two variables, q (which is a deque to store the end index of the string which is present in the
wordDict
) and visited (which will store the starting index of the string which is present in thewordDict
). - We will run a loop through the q, and at each step, we will do the following:
- Pop the left element from the q into the
start
variable. - If
start
is already present in thevisited
list, we will not do anything, and skip that iteration. - Otherwise, we run a for-loop named
end
(say) throughstart
till the length of the string i.e.len(s)
. - Inside the loop we check if the substring starting from
start
tillend
is present in thewordDict
or not. - If the substring is present, then we add the
end
index of the substring into ourq
data structure. - Else we skip and do nothing
- At any point if the
end
variable becomes equal to the length of the string, i.e.len(s)
, then return aTrue
and come out of the iterations.
- Pop the left element from the q into the
- We will also add the start to our visited data structure every time we come out of the loop.
Let us not look at the implementation below to understand our code better.
Implementation in C++, Java and Python
C++ Implementation
Below given is the C++ code for the above breadth-first search approach.
Code:
class WordBreak {
public:
// function to compute word break
bool word_Break(string s, vector<string>& wordDict) {
// storing word_set in a set
set<string> word_set(wordDict.begin(), wordDict.end());
// q is a queue to store the end
// index of the string which is
// present in the `wordDict`
queue<int> q;
// visited stores the starting index
// of the string which is
// present in the `wordDict`
vector<bool> visited(s.length(), false);
q.push(0);
// iterate till q becomes empty
while (!q.empty()) {
int start = q.front();
q.pop();
// If `start` is already present
// in the `visited` list, do not do
// anything, and skip that iteration.
if (visited[start]) {
continue;
}
// run a for-loop named `end`
// through `start` till the
// length of the string
for (int end = start + 1; end <= s.length(); end++) {
// check if the substring starting from `start`
// till `end` is present in
// the `wordDict` or not.
if (word_set.find(s.substr(start, end - start)) !=
word_set.end()) {
q.push(end);
// if the `end` variable becomes equal
// to the length of the string,
// then return `True`
if (end == s.length()) {
return true;
}
}
}
visited[start] = true;
}
return false;
}
};
Python Implementation
Below given is the Python code for the above breadth-first search approach.
Code:
class WordBreak:
// function to compute word break
def word_Break(self, s: str, wordDict: List[str]) -> bool:
# store the given wordDict in a set word_dictionary
word_dictionary = set(wordDict)
# q is a dequeue to store the end
# index of the string which is
# present in the `wordDict`
q = deque()
# visited stores the starting index
# of the string which is
# present in the `wordDict`
visited = set()
# append starting index of s to q
q.append(0)
# iterate till any element is present in q
while q:
# pop from the left of q and store
# the value in start
start = q.popleft()
# If `start` is already present
# in the `visited` list, do not do
# anything, and skip that iteration.
if start in visited:
continue
# run a for-loop named `end`
# through `start` till the
# length of the string
for end in range(start + 1, len(s) + 1):
# check if the substring starting from `start`
# till `end` is present in
# the `wordDict` or not.
if s[start:end] in word_dictionary:
q.append(end)
# if the `end` variable becomes equal
# to the length of the string,
# then return `Tru#
if end == len(s):
return True
visited.add(start)
return False
#driver code
# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)
Output:
True
Java Implementation
Below given is the Java code for the above breadth-first search approach.
Code:
public class WordBreak {
// function to compute word break
public boolean word_Break(String s, List<String> wordDict) {
// storing word_set in a set
Set<String> wordDictSet = new HashSet<>(wordDict);
// q is a queue to store the end
// index of the string which is
// present in the `wordDict`
Queue<Integer> queue = new LinkedList<>();
// visited stores the starting index
// of the string which is
// present in the `wordDict`
boolean[] visited = new boolean[s.length()];
queue.add(0);
// iterate till q becomes empty
while (!queue.isEmpty()) {
int start = queue.remove();
// If `start` is already present
// in the `visited` list, do not do
// anything, and skip that iteration.
if (visited[start]) {
continue;
}
// run a for-loop named `end`
// through `start` till the
// length of the string
for (int end = start + 1; end <= s.length(); end++) {
// check if the substring starting from `start`
// till `end` is present in
// the `wordDict` or not.
if (wordDictSet.contains(s.substring(start, end))) {
queue.add(end);
// if the `end` variable becomes equal
// to the length of the string,
// then return `True`
if (end == s.length()) {
return true;
}
}
}
visited[start] = true;
}
return false;
}
}
Time Complexity Analysis
Let us suppose, the length of the string is $N$.
If we observe the code, we have basically three iterations going on —
- First is the while loop which iterates through the q data structure. In the worst case, it can run till $N$ times.
- Second if the for-loop which runs from start till the length of the string. It can also run to approximately $N$ times in the worst case.
- Third is the substring we take out at every step. It will take another $O(N)$ time
So, overall we have the worst case time complexity is $O(N^3)$
Space Complexity Analysis
We are using a queue data structure, visited list, and another data structure to store the dictionary into a set. So, we can consider that the overall worst-case space taken will be $O(N)$.
NOTE:
The memoization strategy we learned earlier was more like DFS since after finding a match, we search all the way to the end of the string before returning and repeating the process for any further matches with the same start index. But in BFS, it is important to find all possible matches for the start index (i.e., the same level) before moving on to examine neighbouring indexes and so forth. But the concept is the same. Also, there is no such improvement in terms of time or space here in the BFS approach.
Approach 4: Using Dynamic Programming
We have already used the memoization approach earlier in the article where we had a memo table to cache all our subproblems that were already solved. However, in that approach, the problem was that we were using the recursion stack space. So, let us now optimise that approach and cut down that extra recursive stack space. So, in this approach, we will be using dynamic programming to solve our extra stack space. Let us look at the algorithm.
Algorithm:
- First we create a
dp
array of size $n+1$, considering $”n”$ to be the length of the given string. - We initialize the
dp
array with false. Only, the first index of thedp
array is kept as True, since the $null$ string will always be present in the dictionary. - After that, we also take two index pointers
i
andj
. - We run a for-loop through
i
which starts from index 1, and go through the length of the string, i.e.len(s)
. Inside the loop, we run another loop forj
which runs tilli
.- Inside the
j
loop, we check if thedp[j]
isTrue
- And if it is
True
, then we also check if the substrings[i:j]
is present in the dictionary or not. - If
s[i:j]
is present, then we simply markdp[i]
asTrue
and break from the loop
- Inside the
- Finally, we return the value present at the $nth$ index of the
dp
array. If it isTrue
, then yes the wordbreak can be formed, otherwise, no thewordbreak
cannot be formed.
Implementation in C++, Java and Python
C++ Implementation
Below given is the C++ code for the above dynamic programming approach.
Code:
class WordBreak {
public:
// function to compute wordbreak
bool word_break(string s, vector<string>& wordDict) {
// store the worddict in a set
set<string> word_dictionary(wordDict.begin(), wordDict.end());
// create a `dp` array of size s.length()
vector<bool> dp(s.length() + 1);
// only the first index of the `dp` array is kept as True,
// since the null string will
// always be present in the dictionary.
dp[0] = true;
for (int i = 1; i <= s.length(); i++) {
for (int j = 0; j < i; j++) {
// check if dp[j] is True, and if it is True,
// also check if the substring `s[i:j]` is
// present in the dictionary or not.
if (dp[j] and
word_dictionary.find(s.substr(j, i - j)) != word_dictionary.end()) {
// If `s[i:j]` is present,
// mark `dp[i]` as `True`
// and break from the loop
dp[i] = true;
break;
}
}
}
// Return the value present at the
// nth index of the `dp` array, either true or false
return dp[s.length()];
}
};
Python Implementation
Below given is the Python code for the above dynamic programming approach.
Code:
class WordBreak:
# function to compute wordbreak
def word_break(self, s, wordDict):
word_dictionary = set(wordDict)
# create a `dp` array of size s.length()
dp = [False] * (len(s) + 1)
# only the first index of the `dp` array is kept as True,
# since the null string will
# always be present in the dictionary.
dp[0] = True
for i in range(1, len(s) + 1):
for j in range(i):
# check if dp[j] is True, and if it is True,
# also check if the substring `s[i:j]` is
# present in the dictionary or not.
if dp[j] and s[j:i] in word_dictionary:
# If `s[i:j]` is present,
# mark `dp[i]` as `True`
# and break from the loop
dp[i] = True
break
# Return the value present at the
# nth index of the `dp` array, either true or false
return dp[len(s)]
#driver code
# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)
Output:
True
Java Implementation
Below given is the Java code for the above dynamic programming approach.
Code:
public class WordBreak {
// function to compute wordbreak
public boolean word_break(String s, List<String> wordDict) {
// store the worddict in a set
Set<String> word_dictionary_set = new HashSet<>(wordDict);
// create a `dp` array of size s.length()
boolean[] dp = new boolean[s.length() + 1];
// only the first index of the `dp` array is kept as True,
// since the null string will
// always be present in the dictionary.
dp[0] = true;
for (int i = 1; i <= s.length(); i++) {
for (int j = 0; j < i; j++) {
// check if dp[j] is True, and if it is True,
// also check if the substring `s[i:j]` is
// present in the dictionary or not.
if (dp[j] && word_dictionary_set.contains(s.substring(j, i))) {
// If `s[i:j]` is present,
// mark `dp[i]` as `True`
// and break from the loop
dp[i] = true;
break;
}
}
}
// Return the value present at the
// nth index of the `dp` array, either true or false
return dp[s.length()];
}
}
Time Complexity Analysis
In the above dynamic programming approach, we are iterating through two for-loops which are taking the time of $O(N^2)$, considering $N$ to be the length of the string. Also, at each step, we will be computing the substring of the given string, which will take another $O(N)$.
Hence, the worst-case time complexity for this approach will be $O(N^3)$
Space Complexity Analysis
Since we have used a dp
array of size N+1
, the worst-case space complexity will be $O(N)$.
Approach 5: Using Trie Data Structure
Firstly, let us understand what is a trie data structure.
Tries are a very unique and practical data structure that is built on a string's prefix
. Strings are kept in the data structure Trie
, and it looks like a graph. There are nodes
and edges
in it. At most 26 children
can be found in each node, and edges join each parent node to its children. These 26
pointers are just pointers for each of the 26
English alphabetic characters. Every edge is kept as a separate edge. It is always a good choice to optimize the search for substrings using a Trie Data Structure.
Explaining the implementation of the Trie data structure is not under the scope of this article. Let us head on to our code for the same.
Algorithm:
Here we will only discuss the steps we have performed in the wordBreak recursive function. Implementation of TrieNode is not under the scope of the article.
- First find the
length
of the string given - If the length of the string is 0 then return a
True
because Null character will always be present in the word dictionary given. - Next, run a loop through the
length
of the string - Check if the
prefix of the string
is present in the Trie or not. - If the prefix is present, then again run the recursive function through the remaining length of the string.
- Finally return a
True
if all the prefixes of the strings
has been checked and they are present in the word dictionary.
Implementation in C++, Java and Python
C++ Implementation
Below given is the C++ code using the Trie data structure.
Code:
#include <iostream>
using namespace std;
struct TrieNode {
struct TrieNode* children[26];
// wordEnd is true if the node represents
// end of a word
bool wordEnd;
};
// to return a new trie node
struct TrieNode* getNode(void)
{
struct TrieNode* trie_node = new TrieNode;
trie_node->wordEnd = false;
for (int i = 0; i < 26; i++)
trie_node->children[i] = NULL;
return trie_node;
}
// inserting into the trieNode
void insert(struct TrieNode* root, string key)
{
struct TrieNode* tNode = root;
for (int i = 0; i < key.length(); i++) {
int index = key[i] - 'a';
if (!tNode->children[index])
tNode->children[index] = getNode();
tNode = tNode->children[index];
}
// mark last node as leaf
tNode->wordEnd = true;
}
// Returns true if key presents in trie, else
// false
bool search(struct TrieNode* root, string key)
{
struct TrieNode* tNode = root;
for (int i = 0; i < key.length(); i++) {
int index = key[i] - 'a';
if (!tNode->children[index])
return false;
tNode = tNode->children[index];
}
return (tNode != NULL && tNode->wordEnd);
}
// returns true if string can be broken down
// into the words in the dictionary
bool wordBreak(string str, TrieNode* root)
{
int size = str.size();
// Base case
if (size == 0)
return true;
// Try all prefixes of lengths from 1 to size
for (int i = 1; i <= size; i++) {
if (search(root, str.substr(0, i))
&& wordBreak(str.substr(i, size - i), root))
return true;
}
// after computing all prefixes and
// not finding the result, return false
return false;
}
Python Implementation
Below given is the Python code using the Trie data structure.
Code:
class WordBreak(object):
def word_break(self, s, wordDict):
# initializing our TrieNode class
class TrieNode(object):
def __init__(self):
self.children = []
self.leafNode = False
# implementing the method for getting a
# TrieNode from our implemented TrieNode
def getTrieNode(self):
temp = TrieNode()
temp.children = []
# 26 represents the number
# of alphabets present
# in the English
for i in range(26):
temp.children.append(None)
temp.leafNode = False
return temp
# to search a node into our TrieNode
def search(self, root, x):
tNode = root
for i in x:
idx = ord(i)-97
if(tNode.children[idx] == None):
return False
tNode = tNode.children[idx]
if(tNode and tNode.leafNode):
return True
# to insert a TrieNode into our Trie
def insertTrieNode(self, root, x):
x = str(x)
tNode = root
for i in x:
index = ord(i)-97
if(tNode.children[index] == None):
# node has to be initialised
tNode.children[index] = self.getTrieNode()
tNode = tNode.children[index]
tNode.leafNode = True # marking end of word
# this function checks if it is possible
# to do a wordbreak
def isPossibleWordBreak(strr, root):
l = len(strr)
if(l == 0):
return True
for i in range(1, l+1):
if(root.search(root, strr[:i]) and isPossibleWordBreak(strr[i:], root)):
return True
return False
#driver code
# creating object of WordBreak() class
ob = WordBreak()
# calling our wordBreak function
ans = ob.wordBreak("storybook",["story","book"])
# displaying results
print(ans)
Output:
True
Java Implementation
Below given is the Java code using the Trie data structure.
Code:
import java.io.*;
import java.util.*;
class WordBreak {
static class TrieNode {
TrieNode children[];
// isEnd is true if the node
// represents the end of the word
boolean isEnd;
public TrieNode()
{
children = new TrieNode[26]; // 26 is the number of alphabets
for (int i = 0; i < 26; i++)
children[i] = null;
isEnd = false;
}
}
// If the key is the prefix of trie, mark
// it as the leaf node
// otherwise insert the key into the trie
static void insert(TrieNode root, String x)
{
TrieNode tNode = root;
for (int i = 0; i < x.length(); i++) {
int idx = x.charAt(i) - 'a';
if (tNode.children[idx] == null)
tNode.children[idx] = new TrieNode();
tNode = tNode.children[idx];
}
// Mark last node as leaf
tNode.isEnd = true;
}
// If key is present, it returns a True
// Otherwise returs a false
static boolean search(TrieNode root, String x)
{
TrieNode tNode = root;
for (int i = 0; i < x.length(); i++) {
int idx = x.charAt(i) - 'a';
if (tNode.children[idx] == null)
return false;
tNode = tNode.children[idx];
}
return (tNode != null && tNode.isEnd);
}
// Checks whether the words can be broken
// into the words of the dictionary
// Returns True if possible
// Otherwise a False is returned
static boolean wordBreak(String s, TrieNode root)
{
int l = s.length();
if (l == 0)
return true;
for (int i = 1; i <= l; i++) {
if (search(root, s.substring(0, i))
&& wordBreak(s.substring(i, l), root))
return true;
}
return false;
}
Time Complexity Analysis
The Insert and search operation in Trie usually costs $O(keyLength)$. The time complexity for the above operation will be major $O(N^2)$ because we will be performing the search operation
inside the for-loop. Please note, N
is the length of the string i.e. len(s)
.
Space Complexity Analysis
The space complexity for the above approach will be $O(N + M)$, where N is the length of the string i.e. len(s)
and M
is the number of words in the wordDict
.
Conclusion
In this article, we have learned about the word break problem. Let us now summarize, what we have seen throughout:
- In the word break problem, you will be given a string, say “s”, and a dictionary of strings say “wordDict”. You have to return
true
ifs
can be segmented into a space-separated sequence of one or more dictionary words. - For the brute force approach, we can choose the
recursion and backtracking
to solve the problem. However, the time complexity will become exponential in this case. - For an improvement for the above approach, we can
memoize our code
. That is, we will store the results of the already computed subproblems and improve the time complexity to a great extent. - Another similar approach will be by using the
breadth-first search
. There will be no improvement over the optimisation but it is another good approach. - Next, we moved on to the
Dynamic Programming
approach, where we omitted the extra stack space taken by the recursive solutions as well. - Finally, we learned about another approach to solve the problem – using the
trie data structure
. It improved the time and space complexity of the solution.