This is the fourth part of the Find My Phrase series in finding a seed phrase (with code!):
- Part 1: Find the last word in a seed phrase.
- Part 2: Find any word in a seed phrase.
- Part 3: Find a used seed phrase.
- Part 4: Find multiple missing words in a seed phrase. ← We're here
- Part 5: Find a used seed phrase with multiple missing words.
Alright, now that we're able to find any one (1) missing word in a seed phrase. Let's take it further and find multiple missing words.
As I mentioned in the first part, there is a limit to how many words you can have missing (in terms of seed phrases you can realistically check).
The more words you have missing, the more time and computing power required. This is what makes a seed phrase secure. It is unrealistic to determine a seed phrase after so many missing words because you (and your children, grandchildren) won't be on this world when it's complete.
But, that doesn't mean we can't try! We're going to code something that will be able determine every valid combination seed phrase with one, two, three, etc. missing words.
We might not be able to get the end result in a timely fashion but at least we'll have the ability to do it.
Disclaimer: This is meant to be an educational exercise to utilize programming to explore automation. It is not recommend to do this with your own seed phrase without a secure machine. Entering your seed phrase on a device connected to the internet exposes your seed phrase to potential security threats. If you choose to do so, you fully understand the risks are liable for the consequences.
Finding Missing Multiple Words in a Seed Phrase
Step 1: Starting New(Again)
I know, again. But we're going to do a bit of re-organizing based on what we learned about libraries and functions in the previous post. Open your findmyphrase.py file and delete everything. Again, it should look like this:
Step 2: Importing Libraries
We're going to use two libraries, hashlib and itertools. No installation needed this time. They are both built-in libraries (i.e. standard with the Python installation).
Add this to the beginning of your blank code file:
import itertools
import hashlib
As you can see, we do this by using the command import and the library name.
Step 3: Beginning our Function
Again, a function is a set of instructions that takes an input and can produce an output.
For our function the input is an incomplete seed phrase.
The output is all the potential seed phrases (considering the missing words and their position).
Let's start with defining our function and it's input. We're going to name our function get_possible and the incomplete seed phrase input as seed_phrase.
Note: We're assuming something is already stored in the variable seed_phrase when we are defining a function.
Add this after the library import portion of the code:
def get_possible(seed_phrase):
Step 4: Checking Our Seed Phrase
Now this beginning portion may look a little familiar. We first split up the words in the a seed phrase (so we're able to work with each word individually).
We want our function to take take any input and our job is to manipulate it generate an appropriate output.
But that mean's it could be anything and we want to make sure whatever is put in there is what we're expecting.
So we're going to add an if statement checking the length to ensure that it has 12, 15, 18, 21, or 24 words (which is what we expect for a BIP39 seed phrase).
If it is not it will give you a message "Your seed phrase must be 12, 15, 18, 21, or 24 words. Please place a question mark (?) for missing words." and stop the program.
Add this part to your code:
#converts seed phrase into a list to be able to interface with each word individually.
seed_phrase = seed_phrase.split(" ")
if len(seed_phrase) not in [12, 15, 18, 21, 24]:
print("Your seed phrase must be 12, 15, 18, 21, or 24 words. Please place a question mark (?) for missing words.")
raise SystemExit(0)
Step 5: Importing Our Word List
Again, we're going to save our BIP39 word list (that we have in our "english.txt" file in the folder) to variable english as a list. This portion is exactly the same as previous.
Add this to your code:
#opens the "english.txt" file and stores it into variable "english"
english = open("english.txt")
#reads the "english.txt" file stored in variable "english" and stores the words in the variable "word_list". Also, changes the variable type to a list.
word_list = english.read().split("\n")
#closes the "english.txt" file stored in variable "english" since we don't need it anymore.
english.close()
Step 6: Seed Phrase to Indexed Numbers and Binary
Again, this portion is exactly the same as previous. We're taking our seed phrase and transforming the words to their indexed number in the BIP39 wordlist. Then we're going to change those indexed numbers into binary (1's and 0's, with each 1 and 0 called a bit).
As a reminder, each word in the BIP39 wordlist represents an indexed number (0 - abandon... 2047 - zoo). Each of those numbers can be turned into an 11 bit binary number (00000000000 - 0, 11111111111 - 2047)
Add this to your code:
#converts seed_phrase (with words) to indexed number in BIP39 wordlist
seed_phrase_index = [word_list.index(word) if word != "?" else word for word in seed_phrase]
#converts seed_phrase_index (with numbers) to binary
seed_phrase_binary = [format(number, "011b") if number != "?" else number for number in seed_phrase_index]
Time for a Refresher
Again, based on what we learned in the first post:
Each word in the BIP39 wordlist is represented by a 4 digit index number (0000 - 2047)
Each 4 digit index number can be represented by a 11 bit binary number (00000000000 - 11111111111)
Thus, a seed phrase in binary format would consist of:
- 12 word seed phrase resulting in 132 bits (12 words x 11 bits)
- 24 word seed phrase resulting in 264 bits (24 words x 11 bits)
When a seed phrase is in binary format it is made up of something called entropy and a checksum:
- 132 bits: 128 bit entropy + 4 bit checksum
- 264 bits: 256 bit entropy + 8 bit checksum
Your entropy is made up of the bits from previous words (partial entropy) and some remaining bits we're going to call "missing bits":
- 128 bit entropy: 121 bit partial entropy (11 words x 11 bits) + 7 missing bits
- 256 bit entropy: 253 bit partial entropy (23 words x 11 bits) + 3 missing bits
The checksum is calculated by inputting your entropy in the SHA256 function:
- 128 bits -> 4 bit checksum
- 256 bits -> 8 bit checksum
Thus, the last word (represented by 11 bits) is made up of the checksum and the last missing bits of your entropy:
- 11 bit last word of 12 words seed phrase: 7 missing bits + 4 bit checksum
- 11 bit last word of 24 words seed phrase: 3 missing bits + 8 bit checksum
Step 7: Calculating the Number of Missing Bits
The number of missing bits is based on the length of the seed phrase. The length of missing bits can be calculated by (11 - length of seed phrase/3). The number of missing bits will be stored in the variable num_missing_bits.
Add this to your code:
#calculates the number of missing bits based on length of seed phrase
num_missing_bits = int(11-(1/3)*(len(seed_phrase)))
Step 8: Calculating the Possible Bits for a Word
Let's say we were missing a word (other then last word), we'd be missing 11 bits in that position for our entropy.
We could try every 11 bit binary number in the that position to calculate the checksum.
Thus, we want the all the possible bit combinations for a missing word (which represents all of the possible words in the BIP39 wordlist).
This code below calculates all the possible permutations of 11 bits (00000000000 to 11111111111) and stores it in possible_word_bits.
Add this to your code:
#calculates all the possible bits for a missing word
possible_word_bits = (bin(x)[2:].rjust(11, "0") for x in range(2**11))
Step 9: If...Else
As a reminder, we're trying to be able to find any word of a seed phrase (it could be the first, second, third, etc. or it could be the last word).
From our first entry, we learned that the last word is calculated based on the previous words. So if the last word was missing, we could simply calculate what it could be based on the previous words.
From our second entry we learned that if a different word was missing, we'd have to check all of the 2048 of the words in the BIP39 wordlist in those positions and calculate the last word to see if it matches.
So, with an if else statement, we've got to check if we have a last word or not:
If we do have a last word, we have missing bits and a checksum to compare against. We're going to save the missing_bits_possible and the checksum that make up the total 11 bits in the last word.
If we don't have a last word, then all the possible permutations of missing bits is a potential word. We'll calculate the all the possible missing bits based on the num_missing_bits and save it to missing_bits_possible. We'll also create an empty ("") checksum since we will have to calculate that with our possible entropies.
Add this to your code:
if seed_phrase_binary[-1] != "?": #if the last word is not "?"
missing_bits_possible = (seed_phrase_binary[-1][0:num_missing_bits],) #save the leftover bits
checksum = seed_phrase_binary[-1][-(11-num_missing_bits):] #save the checksum
else:
#calculates all the possible permutation of missing bits for entropy
missing_bits_possible = (bin(x)[2:].rjust(num_missing_bits, "0") for x in range(2**num_missing_bits)) # calculate all the possible leftover bits
checksum = "" #empty checksum
Step 10: All the Possible Missing Word "Combinations"
This is were it gets tricky. We want to be able to try every "combination" of words in the missing locations.
For example if we were missing two words, we'd want to try the first word in the BIP39 wordlist: "abandon, abandon".
Then, we'd want to try the first and second word: "abandon, ability". And so on and so forth for all the words.
Technically, we want the cartesian product of the BIP39. This means a new "combination" takes in account the order of the words (e.g. "abandon, ability" is different than "ability, abandon") and allows repetition of the words (e.g. "abandon, abandon")
So we're first going to get every "combination" of missing words (by themselves) based on the number of words missing. We'll input them in our seed phrase in the next step.
The code below will do what we described: find every "combination" of words depending on how many words are missing, but in binary (since we're working with a seed phrase in binary form).
For example, the "combination"s to check with two missing words is (00000000000, 00000000000); (00000000000, 00000000001); etc.
Add this to your code:
#determine all the possible bit "combinations" (cartesian product) depending on the number of missing words
possible_word_bits_combination = (combination for combination in itertools.product(possible_word_bits,repeat=seed_phrase[:-1].count("?")))
Step 11: Inputting the Possible "Combinations" in Partial Entropy
Now we're going to input all those "combinations" in the missing word locations of our seed phrase (minus the last word since as we discussed, is dependent on the previous). This is our partial_entropy.
Add this code below:
#input all the "combinations" into the the binary form of the seed phrase (minus the last word), also known as your partial entropy
partial_entropy = tuple("".join((combination.pop(0) if word == "?" else seed_phrase_local[index] for index,word in enumerate(seed_phrase_local))) if (seed_phrase_local := seed_phrase_binary[:-1]) and (combination := list(word_bits_combination)) else "".join(seed_phrase_local) for word_bits_combination in possible_word_bits_combination)
Step 12: Inputting the Missing Bits to Complete Entropy
Finally, we're going to add the missing_bits to the end of our partial entropy to complete our 128 - 256 bit entropy (entropy_possible).
Remember if we had a last word, we know exactly what those missing bits are (because we saved them!).
If we did not have a last word, it could be any potential missing bit "combination" based on the length of missing bits that we calculated earlier.
Add this code below:
#adds the missing bits to complete the entropy
entropy_possible = tuple(bit_combination + missing_bits for missing_bits in missing_bits_possible for bit_combination in partial_entropy )
Step 13: Calculating the Checksum
Now that we have our possible entropies, we're going to use it to calculate the checksums. To do this, we'll put those entropies through a SHA256 function.
Earlier in our code, there was an if else statement that checked if there was a last word or not:
- If we had the last word, it saved the checksum. We would have a checksum to "check" against. If the checksum calculated from the entropy matched the checksum in the last word, it would be a valid possible seed phrase (This helps cut down on the number of potential seed phrases).
- If we did not have the last word, it saved an empty checksum. Thus, every single checksum calculated is a valid possible seed phrase.
This code below will calculate the checksums (calc_checksum) for all the possible entropies and adds it to the end of our entropy (completing our seed phrase) if:
- Matches the checksum saved earlier (i.e. there was a last word)
- There is an empty checksum (i.e. there was not a last word, thus all checksums are valid)
#input each entropy_possible in the SHA256 function to result in the corresponding checksum
seed_phrase_binary_possible = (entropy + calc_checksum for entropy in entropy_possible if checksum == (calc_checksum := format(hashlib.sha256(int(entropy, 2).to_bytes(len(entropy) // 8, byteorder="big")).digest()[0],"08b")[:11-num_missing_bits]) or checksum == "")
Thus, we will have all the possible seed phrases in binary form (seed_phrase_binary_possible).
Step 15: Back to Words
Now that we have all of the potential seed phrases in binary form, let's turn it back into the readable, word form we all know and love. We'll store it in seed_phrase_possible.
Add this to your code:
#transforms all of the seed phrases in binary form back to word form
seed_phrase_possible = tuple(" ".join([word_list[int(binary[i:i+11],2)] for i in range(0, len(binary), 11)]) for binary in seed_phrase_binary_possible)
Step 15: Outputting Our Possible Seed Phrases
Remember, this is a function. When we put in an input (an incomplete seed phrase), we want an output (all the possible seed phrases).
So finally, lets output (return) our seed_phrase_possible.
return seed_phrase_possible
Putting it Together
We have our function that takes in a seed phrase with missing words, and outputs all the possible seed phrases based on the BIP39 wordlist. The code in its entirety is below:
import itertools
import hashlib
def get_possible(seed_phrase):
#converts seed phrase into a list to be able to interface with each word individually.
seed_phrase = seed_phrase.split(" ")
if len(seed_phrase) not in [12, 15, 18, 21, 24]:
print("Your seed phrase must be 12, 15, 18, 21, or 24 words. Please place a question mark (?) for missing words.")
raise SystemExit(0)
#opens the "english.txt" file and stores it into variable "english"
english = open("english.txt")
#reads the "english.txt" file stored in variable "english" and stores the words in the variable "word_list". Also, changes the variable type to a list.
word_list = english.read().split("\n")
#closes the "english.txt" file stored in variable "english" since we don't need it anymore.
english.close()
#converts seed_phrase (with words) to indexed number in BIP39 wordlist
seed_phrase_index = [word_list.index(word) if word != "?" else word for word in seed_phrase]
#converts seed_phrase_index (with numbers) to binary
seed_phrase_binary = [format(number, "011b") if number != "?" else number for number in seed_phrase_index]
#calculates the number of missing bits based on length of seed phrase
num_missing_bits = int(11-(1/3)*(len(seed_phrase)))
#calculates all the possible bits for a missing word
possible_word_bits = (bin(x)[2:].rjust(11, "0") for x in range(2**11))
if seed_phrase_binary[-1] != "?": #if the last word is not "?"
missing_bits_possible = (seed_phrase_binary[-1][0:num_missing_bits],) #save the leftover bits
checksum = seed_phrase_binary[-1][-(11-num_missing_bits):] #save the checksum
else:
#calculates all the possible permutation of missing bits for entropy
missing_bits_possible = (bin(x)[2:].rjust(num_missing_bits, "0") for x in range(2**num_missing_bits)) # calculate all the possible leftover bits
checksum = "" #empty checksum
#determine all the possible bit "combinations" (cartesian product) depending on the number of missing words
possible_word_bits_combination = (combination for combination in itertools.product(possible_word_bits,repeat=seed_phrase[:-1].count("?")))
#input all the "combinations" into the the binary form of the seed phrase (minus the last word), also known as your partial entropy
partial_entropy = tuple("".join((combination.pop(0) if word == "?" else seed_phrase_local[index] for index,word in enumerate(seed_phrase_local))) if (seed_phrase_local := seed_phrase_binary[:-1]) and (combination := list(word_bits_combination)) else "".join(seed_phrase_local) for word_bits_combination in possible_word_bits_combination)
#adds the missing bits to complete the entropy
entropy_possible = tuple(bit_combination + missing_bits for missing_bits in missing_bits_possible for bit_combination in partial_entropy )
#input each entropy_possible in the SHA256 function to result in the corresponding checksum
seed_phrase_binary_possible = (entropy + calc_checksum for entropy in entropy_possible if checksum == (calc_checksum := format(hashlib.sha256(int(entropy, 2).to_bytes(len(entropy) // 8, byteorder="big")).digest()[0],"08b")[:11-num_missing_bits]) or checksum == "")
#transforms all of the seed phrases in binary form back to word form
seed_phrase_possible = tuple(" ".join([word_list[int(binary[i:i+11],2)] for i in range(0, len(binary), 11)]) for binary in seed_phrase_binary_possible)
return seed_phrase_possible
Step 16: Testing it Out
We're going to test this function out with this seed phrase: "? entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby".
Note: Do not test this with more than one missing word. The capability is there, but will take a significantly longer time due to the significant number of possibilities.
Add this to your code:
seed_phrase = "? entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby"
print(get_possible(seed_phrase))
Save the code and run it.
You should get this result:
('crawl entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'element entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'insane entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'million entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'rally entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'release entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'roof entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'symptom entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'timber entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby', 'weapon entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect baby')
As you can see, the first word is different for each seed phrase. These are all the valid possible seed phrases.
The Next Step
You might be thinking, "Wow, this seems like a lot!". You'd be right, it is.
Also, you might be asking, "what am I going to do with all these possible seed phrases?".
Well if you read the previous part of the series, we have a function that takes in seed phrases and outputs which one has been used (i.e. has made transactions).
So if we put these two functions together, we'll be able to input an incomplete seed phrase, get all the possible seed phrases, and output only the seed phrase that has been used.
We'll discuss that in the next part of our series: Find a Used Seed Phrase with Multiple Missing Words.