You're in a situation where you need to recover your wallet.
You go check your seed phrase only to discover there's missing words.
Oh no. What do you do?
Well luckily, assuming that it is a BIP39 seed phrase, those missing words can only be a possible 2048 words!
And depending on how many missing words and in what position, there can be up to...
29642774844752946028434172162224104410437116074403984394101141506025761187823616 possible combinations!
Well, actually I lied. Technically not all those combinations are valid. There are rules to a seed phrase which will only result in up to a fraction of that...
The last word is special and partially dependent on the information stored in the previous words. You're left with:
1976184989650196401895611477481606960695807738293598959606742767068384079188241 possible valid combinations!
I know, I know, that's still a lot. But it's you've got some options now:
- Option A: You find and try every single valid combination.
- Option B: You find and try every single valid combination but write a program to do it for you.
Option A is the "easiest" to do. But, option B is most definitely the fastest.
For the sake of time (and interest), let's go with option B.
Introduction
We will go through how to create a script (an automated series of instructions) by writing a program to find your seed phrase.
This is geared towards those with very little programming knowledge. I, myself, am not a programmer nor a software engineer. As such, I've written this series with the intent that someone with a basic understanding can hopefully follow.
Now I'm not going to teach you the fundamentals of programming, but I will be explaining what I'm trying to achieve for each section of code I write.
So when you inevitably copy and paste, you'll be able to review the code and understand what is happening step by step.
This will be a multi-part series and we will go through how to code different scenarios. I'm going to call this series Find My Phrase and here are the parts below:
- Part 1: Find the last word in a seed phrase. ← We're here
- Part 2: Find any word in a seed phrase.
- Part 3: Find a used seed phrase.
- Part 4: Find multiple missing words in a seed phrase.
- Part 5: Find a used seed phrase with multiple missing words.
Each part will build on the concepts of the previous part and create a comprehensive program that will find your seed phrase.
We'll be utilizing Python, a programming language that is good for automated tasks and prototyping.
There is a limit to how many words you can have missing in terms of seed phrases you can realistically check.
This can take serious computing power and time depending on how many words you have missing. But, this is what makes a seed phrase secure. If it was easy, people could check all the seed phrases.
So missing one or two words isn't a big deal. But more than that can be unrealistic in terms of time or computing power.
For this first part, we're going to assume that we're missing the last word.
You might be thinking, well we're just going to try all 2048 words right?
Wrong. Remember we said earlier that the last word is special. We have to use the information stored in the previous words to calculate what the last word could be.
A few caveats to make before we start:
- This assumes the seed phrase in question is a BIP39 seed phrase.
- Bitcoin, at some point in time, was sent to this wallet (even if it’s currently empty).
Let's get started.
Disclaimer: This is meant to be an educational exercise to utilize programming to explore automation. It is not recommend to do this with your own seed phrase without a secure machine. Entering your seed phrase on a device connected to the internet exposes your seed phrase to potential security threats. If you choose to do so, you fully understand the risks are liable for the consequences.
Finding the Last Word of a Seed Phrase
This will be the seed phrase we'll be working with:
element entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect ?
There's only 23 words. We're missing the last word. Let's find it.
Step 1: Download and Install Python
Can't write a script in Python without Python.
Go ahead and follow the instructions here to download the latest version of Python: https://www.python.org/downloads/
Step 2: Create a Folder to Store All Your Files
Once you've installed Python, we should have all that we need to begin writing your code. But first, let's keep it organized. Create a folder on your desktop. We're going to store all the files we need in there. I'm going to rename my folder "findmyphrase".
Step 3: Create a PY File
A PY file is a Python code file where we are going to write our code.
Start with creating a blank text file. (For Mac open the TextEdit application and save it into your folder. For Windows, right-click -> New -> Text Document).
Then rename the file to have a .py extension at the end and press enter. I'm going to name my file, "findmyphrase.py".
Step 4: Open the PY File
Double-click and open the .py file. When you open the .py file, there will be two windows the pop up. The first window will be showing the results from our code (left). The second window is where going to begin writing our code.
Step 5: Create a Variable
The first thing we will in our code is create something called a "variable" where we can store information. By creating a variable, we will be able to reference the information stored in the variable in our code. The information we will be storing in our variable is our partial seed phrase.
If you notice in the image below, I can make comments using a “#” in the code which won't be "read". This means any text beginning with a "#" will not be considered part of the code and is purely for commenting/reference purposes.
I will be utilizing them to explain each line of the code. They're also useful when going back to read your code to understand what you were trying to do.
To create a variable, we first give it a name. We're going to name our variable, seed_phrase. To store information in our variable, we use an equal sign ( = ) and set the variable equal to something within quotation marks ( " " ).
In this case, we'll set our variable seed_phrase equal to our partial seed phrase and since we're missing the last word, we'll put a question mark ( ? ) where the last word should be: "element entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect ?".
We're also going to turn our variable into "list" variable type using the command .split(" "). This will separate each individual word but stores all of the words in a single variable. This will let us interface with each word individually instead of one long "sentence".
Add this to the window to the right:
#seed phrase separated by spaces
seed_phrase = "element entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect ?"
#converts seed phrase into a list to be able to interface with each word individually.
seed_phrase = seed_phrase.split(" ")
It'll look something like this below:
Step 6: Importing Our Word List
Now to try every single word, we'll need to be able to interface with every single word.
So, we're going to save the entire wordlist into a variable. But, we're not going to bloat our code by typing the entire wordlist in it.
We're going to bring it in from a separate file in our folder.
First, create another text file in our created folder and open it. I'm going to call mine "english.txt".
Copy and Paste the BIP39 wordlist from its official source:
https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt
I hit the "Raw" (circled red above) button and use Ctrl-A (Windows) or Cmd-A (Mac) to select all.
Then copy, Ctrl-C/Cmd-C, and paste, Ctrl-P/Cmd-P, into our newly created text file, "english.txt".
Then save this file.
Now, we're going to write some code. We're want to read the "english.txt" file and save the words into a variable that we can utilize.
On the next line (press enter), add the following:
#opens the "english.txt" file and stores it into variable "english"
english = open("english.txt")
#reads the "english.txt" file stored in variable "english" and stores the words in the variable "word_list". Also, changes the variable type to a list.
word_list = english.read().split("\n")
#closes the "english.txt" file stored in variable "english" since we don't need it anymore.
english.close()
Again, it'll look like the following:
Step 7: Checking Our Code
Now that we have some code, we're going practice running the program we have and checking if what we have is what we expect.
We're going to utilize the print() function. This function will show a message for anything in-between the parentheses (including the variables we've been using).
Let's add another line in our code to do that:
print(seed_phrase)
This will "print" the information that's stored in seed_phrase and allow you to check it is what you are expecting.
Now save your code by going to the top toolbar: File -> Save.
And run your code by going to the to the toolbar: Run -> Run Module
You should get this result:
['element', 'entire', 'sniff', 'tired', 'miracle', 'solve', 'shadow', 'scatter', 'hello', 'never', 'tank', 'side', 'sight', 'isolate', 'sister', 'uniform', 'advice', 'pen', 'praise', 'soap', 'lizard', 'festival', 'connect', '?']
It should look like this:
NOTE: Feel free to delete the print(seed_phrase) portion of the code since its working as expected.
We're going to use the print() function often to check our code throughout this post to ensure the result we're getting is correct. Feel free to delete them afterwards.
Step 8: Introduction to the Checksum
Remember when I said there are 2048 possible words? Well I sort of lied.
There are actually only a fraction of possible words due to the last word containing something called a checksum.
Basically, the last word is partially dependent on the previous words. It's calculated based on the information stored in those previous words and will result in only a handful of potential last words.
There are only 8 potential last words for a 24 word seed phrase (128 for a 12 word)
But how do we get these potential 8 words?
We have to put our seed phrase into a unique function (an algorithm, "math", set of instructions, etc) known as SHA256 in order to determinate our potential last word.
But, we can't directly put our seed phrase into that function, we need to obtain a specific number from our seed phrase to put into that function.
We need a binary number. A binary number is a number consisting of only 1's and 0's. Each 1 and 0 is known as a bit.
A 24 word seed phrase requires a 256 bit number.
A 12 word seed phrase requires a 128 bit number.
This 256 or 128 bit number is also known as our entropy.
And that entropy is what goes into your SHA256 function to get the information required for the last word.
Now let's get the entropy (256 bits) from our seed phrase to get those potential 8 words.
Step 9: Words To Indexed Numbers
First, we've got to turn our words into numbers and not just any numbers, their indexed number.
If you didn't notice, the BIP39 wordlist is in alphabetical order. If we gave each of those word a number starting from abandon = 0, ability = 1....zoo = 2047, we can turn our seed phrase into numbers. This is their index. But, we don't have to do this manually. Add the following to your code:
#converts seed_phrase (with words) to indexed number in BIP39 wordlist
seed_phrase_index = [word_list.index(word) if word != "?" else word for word in seed_phrase]
print(seed_phrase_index)
This code will look at each word in your seed_phrase, and if it is not a "?" it will look through the BIP39 word_list to find the word number or index in the list.
Save and run your code again. You should get this result:
[573, 604, 1643, 1813, 1131, 1655, 1574, 1539, 854, 1192, 1773, 1599, 1601, 948, 1612, 1899, 32, 1299, 1356, 1645, 1046, 681, 377, '?']
Your seed_phrase_index variable contains the number in which each word corresponds to in the BIP39 wordlist.
Step 10: Numbers to Binary
Now this indexed number can also be represented by an 11 bit binary number. We're going to do just that to get 23, 11 bit binary numbers.
Add the following to your code:#converts seed_phrase_index (with numbers) to binary
seed_phrase_binary = [format(number, "011b") if number != "?" else number for number in seed_phrase_index]
print(seed_phrase_binary)
This code will look at each number in your seed_phrase_index, and if it is not a "?" it will look through the format it into a binary number.
Save and run your code. You should get this result:
['01000111101', '01001011100', '11001101011', '11100010101', '10001101011', '11001110111', '11000100110', '11000000011', '01101010110', '10010101000', '11011101101', '11000111111', '11001000001', '01110110100', '11001001100', '11101101011', '00000100000', '10100010011', '10101001100', '11001101101', '10000010110', '01010101001', '00101111001', '?']
Step 11: Calculating the Missing Bits
If you combine all of your binary numbers together, you only get 253 bits in total. But you need 256 bits to put in the SHA256 function for a 24 word seed phrase!
Those 3 missing bits can be any permutation of 0 and 1.
In other words it can be all 1's, all 0's, or some mix.
But again, we don't have to do that by hand. Add this next section to your code:
#calculates the number of bits missing for entropy
num_missing_bits = int(11-(1/3)*(len(seed_phrase)))
#calculates all the possible permutation of missing bits for entropy
missing_bits_possible = [bin(x)[2:].rjust(num_missing_bits, "0") for x in range(2**num_missing_bits)]
print(missing_bits_possible)
Save and run. This should be the result:
['000', '001', '010', '011', '100', '101', '110', '111']
There are only 8 potential permutations which is oddly is familiar. That corresponds to the 8 potential words we could have...
Let's add each potential 3 bits to the rest of our 253 bits..
Step 12: Putting Together Our Entropy
Now that we have all the potential 3 bits, we've got to add it to the end of our 253 bits (23 words x 11 bits) to make up our 256 bit entropy.
Add this code:
#combines the binary representation of seed phrase with each possible missing bits to result in the possible entropy
entropy_possible = ["".join(seed_phrase_binary[:-1])+bits for bits in missing_bits_possible]
print(entropy_possible)
This will combine the 253 bits together and add the potential 3 bits at the end for 8 total options of 256 bits.
You should get this result:
['0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001000', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001001', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001010', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001011', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001100', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001101', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001110', '0100011110101001011100110011010111110001010110001101011110011101111100010011011000000011011010101101001010100011011101101110001111111100100000101110110100110010011001110110101100000100000101000100111010100110011001101101100000101100101010100100101111001111']
I know, this looks crazy but if you look carefully the first 253 bits are the 11 bits from each word combined and are the same for each number.
The last 3 bits are different and are each potential 3 bits.
We have 8 potential, 256 bit numbers to put into our SHA256 function.
Step 12: Calculating the Checksum
We're going to put each 256 bit number into a SHA256 function to get 8 different checksums.
In order to be able to do a SHA256 function in Python, we have to import something called a "module".
A module is essentially another python file (with code) that we can call upon to utilize its capabilities. In this case, we're importing the module "hashlib". The hashlib module is a built-in module i.e. it comes standard with the standard Python installation. This will let us utilize the hashlib module's SHA256 function and put our 256 bit numbers through it.
Add this to your code:
#inputs each entropy_possible in the SHA256 function to result in the corresponding checksum
import hashlib
checksum = [format(hashlib.sha256(int(entropy, 2).to_bytes(len(entropy) // 8, byteorder="big")).digest()[0],"08b")[:11-num_missing_bits] for entropy in entropy_possible]
print(checksum)
Save and run. This should be the result:
['10001000', '11000011', '11110011', '11111111', '00000010', '11110011', '11001001', '10101111']
If you noticed, 256 bits will only give you 23 full sets of 11 bits (for 23 words). There are 3 bits leftover. We need 8 additional bits.
Each potential 3 bit number resulted in 8 bit checksum.
If you combine together the 3 bit number with its corresponding 8 bit checksum, you get 11 bits. That's enough for a word!
We're going to do exactly that.
Step 13: Combining Missing Bits with the Checksum
We're going to be combining each 3 bit possible number with its respective 8 bit checksum to get our last 11 bit word.
Add this to your code:
#combines the missing bits with its corresponding checksum
last_word_bits = [i + j for i, j in zip(missing_bits_possible, checksum)]
print(last_word_bits)
Save and run. This should be the result:
['00010001000', '00111000011', '01011110011', '01111111111', '10000000010', '10111110011', '11011001001', '11110101111']
You'll have eight 11 bit numbers: each possible 3 bit number combined with its respective 8 bit check sum.
Step 14: Back to Words
We're going to turn each of these 11 bits back into indexed numbers, and then the word it corresponds to in the BIP39 wordlist.
Add this to your code:
#transforms 11 bit number to indexed number and then the corresponding word in the BIP39 wordlist
last_word = [word_list[int(bits, 2)] for bits in last_word_bits]
print(last_word)
Save and run. This should be the result:
['baby', 'debris', 'fury', 'lend', 'leopard', 'salmon', 'summer', 'vote']
Putting It All Together
We've got the potential last words of our seed phrase!
Here is the entirety of the code we discussed above:
#seed phrase separated by spaces
seed_phrase = "element entire sniff tired miracle solve shadow scatter hello never tank side sight isolate sister uniform advice pen praise soap lizard festival connect ?"
#converts seed phrase into a list to be able to interface with each word individually.
seed_phrase = seed_phrase.split(" ")
#opens the "english.txt" file and stores it into variable "english"
english = open("english.txt")
#reads the "english.txt" file stored in variable "english" and stores the words in the variable "word_list". Also, changes the variable type to a list.
word_list = english.read().split("\n")
#closes the "english.txt" file stored in variable "english" since we don't need it anymore.
english.close()
#converts seed_phrase (with words) to indexed number in BIP39 wordlist
seed_phrase_index = [word_list.index(word) if word != "?" else word for word in seed_phrase]
#converts seed_phrase_index (with numbers) to binary
seed_phrase_binary = [format(number, "011b") if number != "?" else number for number in seed_phrase_index]
#calculates the number of bits missing for entropy
num_missing_bits = int(11-(1/3)*(len(seed_phrase)))
#calculates all the possible permutation of missing bits for entropy
missing_bits_possible = [bin(x)[2:].rjust(num_missing_bits, "0") for x in range(2**num_missing_bits)]
#combines the binary representation of seed phrase with each possible missing bits to result in the possible entropy
entropy_possible = ["".join(seed_phrase_binary[:-1])+bits for bits in missing_bits_possible]
#inputs each entropy_possible in the SHA256 function to result in the corresponding checksum
import hashlib
checksum = [format(hashlib.sha256(int(entropy, 2).to_bytes(len(entropy) // 8, byteorder="big")).digest()[0],"08b")[:11-num_missing_bits] for entropy in entropy_possible]
#combines the missing bits with its corresponding checksum
last_word_bits = [i + j for i, j in zip(missing_bits_possible, checksum)]
#transforms 11 bit number to indexed number and then the corresponding word in the BIP39 wordlist
last_word = [word_list[int(bits, 2)] for bits in last_word_bits]
print(last_word)
Now we can try each one of them to see which one results in a balance. It's only 8 of them so it's reasonably do-able (albeit a pain).
But, what if I have a word that's not necessarily the last word missing? We'll discuss that in the next part of our series: Find Any Word in a Seed Phrase.