# NSA Backdoor

## Challenge

I heard someone has been sneakily installing backdoors in open-source implementations of Diffie-Hellman... I wonder who it could be... ;)

## Solution

This challenge was very difficult for me and took over 10 hours. This is likely the intended method. Scroll down for a much simpler approach.

The gen.py script is very similar to the script provided in the Very Smooth challenge. The largest difference is that `c = pow(3, FLAG, n)`

is used instead of `c = pow(FLAG, e, n)`

(where `e = 3`

). Even though only one value was swapped, this is a substantial change that changes the encryption algorithm from RSA to Diffie-Hellman. So, instead of solving the discrete d-th root problem (RSA), we are solving the discrete logarithm problem (Diffie-Hellman). Here's a short explanation of the major difference between RSA and Diffie-Hellman.

There is one hint for this challenge and it reads "Look for Mr. Wong's whitepaper... His work has helped so many cats!" Searching online for "Wong NSA Backdoor" finds this paper titled "How to Backdoor Diffie-Hellman" (Archive). Searching for "diffie-hellman David Wong" finds this article (this is David Wong's website), which links to this YouTube video of a talk by David Wong explaining his paper. The paper and talk both link to mimoo/Diffie-Hellman_Backdoor on GitHub, which is a repo containing the code for this backdoor exploit.

To solve this challenge, I read the paper multiple times, watched the presentation, and looked at a variety of miscellaneous resources on the internet. So, I suggest you also read the paper and watch the presentation to gain a deep understanding of this challenge.

The paper works up to an attack called "Composite Modulus with B-Smooth Order (CM-HSO)," which is what we will be using in this challenge. In order for the CM-HSO backdoor to be a NOBUS (Nobody-But-Us) backdoor, the composite modulus $n = pq$ must be large enough such that it is not susceptible to factoring with Pollard's p-1 algorithm (just like in RSA). To quote the paper, this method's "security also relies on the RSA’s assumption that factoring n is difficult if n is large enough." In the paper and code, "as a way of countering Pollard's p-1 we can add a large factor to both $p-1$ and $q-1$ that we will call $p_{big}$ and $q_{big}$ respectively." However, as we saw in Very Smooth, the public modulus is factorable with Pollard's p-1 algorithm since these large factors are not used and because smooth primes are used in the gen.py script. Thus, we can use the same script as in Very Smooth to factor $n$. Running the factor.py script with the given `n`

produces the following values for `p`

and `q`

:

Note that ZeroBone/PollardRsaCracker is another implementation of Pollard's p-1 that you can try, but I found RsaCtfTool's implementation, which I use in factor.py, to be better.

According to the paper, "Since $p-1$ and $q-1$ are both B-smooth, they are susceptible to be factored with the Pollard's p-1 factorization algorithm, a factorization algorithm that can find a factor $p$ of $n$ if $p-1$ is partially-smooth."

Next, we need to compute $x$ where $g^x\equiv c\pmod n$ (can also we written as $g^x\bmod n=c$ since $c<n$). We know all variables except $x$ and we know that $n=p*q$.

In the backdoor_generator folder of the GitHub repo there is a script to generate backdoored Diffie-Hellman parameters called `backdoor_generator.sage`

and a script to test the attacks called `backdoor_generator_tests.sage`

. The testing script sounds the most interesting because it shows off the exploit versus the generator script shows how to set up the exploit.

Note that before continuing you are going to want to have SageMath installed on your computer, which you can do by following their official installation guide.

The `backdoor_generator.sage`

script discusses two methods of creating backdoored Diffie-Hellman values: Composite Modulus with Hidden Small Subgroup (CM-HSS) and CM-HSO, which I mentioned earlier. CM-HSS is what I thought was originally being used in this challenge since the paper states that it is no longer a Nobody-But-Us backdoor due to the work of Dorey et al. (Archive) and Coron et al's attack (Archive).

As we can see in the `backdoor_generator_tests.sage`

file (which I have saved to this directory) and section 4 of the paper, "two small subgroups" are used. These small subgroups exist because a generator g is chosen "such that both g modulo p and g modulo q generate 'small' subgroups." The existence of the subgroups allows "us to compute two discrete logarithms in two small subgroups instead of one discrete logarithm in one large group." The paper states "For example, we could pick $p$ and $q$ such that $p-1 = 2_{p_1p_2}$ and $q-1 = 2_{q_1q_2}$ with $p_1$ and $q_1$ two small prime factors and $p_2$, $q_2$ two large prime factors." Indeed, if we open backdoor_generator_tests.sage and look at the examples at the start of the `test_CM_HSS`

function we see a small factor for `p-1`

and another small factor for `q-1`

. We can confirm this by taking the bigger `p`

value and getting prime factors of `p-1`

in SageMath by running the below command:

This outputs the following:

Thus, we can see that the graphic in the paper right below the previously quoted text accurately displays the situation. `p-1`

and `q-1`

each factor into 2, a small prime, and a large prime. However, if we take our value for `p-1`

and get it's prime factors by running the following command:

This produces a list of 67 different factors. Therefore, our values for the challenge cannot be broken using this method and we will need to use CM-HSO. Looking at the examples in the `test_CM_HSO`

function in the test script, we see that a list of factors is provided, which is exactly what we have.

Luckily for us, David Wong has written most of the SageMath code for us in the `test_CM_HSO`

function in the test script. If you look at our solve.sage script, you'll notice that almost all of it is copied pasted from the `test_CM_HSO`

function. On lines 1-12 of our script, we define the values we were given and that we figured out so far, then we calculate `p-1`

and `q-1`

, and finally we compute the prime factors of `p-1`

and `q-1`

. Then, we simply apply the algorithm discussed in the paper and presentation to go through the many small subgroups and reconstruct the private key. At the end we print the private key solution and we compute $c=g^x\bmod n$ (where $x$ is the private key that we found). This is exactly how the $c$ value given to us was computed so if the values are the same then we have found a valid private key. Running the script with `sage solve.sage`

produces the following output:

The encrypted messages match so we have found the correct key. So, we have solved the discrete logarithm $c=g^x\bmod n$ for $x$. However, trying to decode this to ascii produces gibberish. This is because discrete logarithms can have multiple solutions. Essentially, it has something to do with group theory. The video explains quite nicely that $\phi(n)=(p-1)(q-1)$ (where $\phi(n)$ is the size of the group) if $n=pq$. According to the paper several properties of `p`

and `q`

result "in a non-cyclic group with an upper-bound on possible subgroup orders of $lcm(s,t)=\frac{(p-1)(q-1)}{2}$." So, `gmpy2.lcm(p-1, q-1) == ((p-1)*(q-1))//2`

. Therefore, Carmichael totient's is $\lambda=\frac{(p-1)(q-1)}{2}$. The Carmichael function $\lambda(n)$ is the exponent/order of the multiplicative group of integers modulo n. (This page explains more about the Carmichael function. We don't use Euler's totient because the Carmichael function is more generalized.) Therefore, we can add or subtract $\frac{(p-1)(q-1)}{2}$ from the solution our SageMath script found (because we have a cyclical group) and then compute $c=g^x\bmod n$ with our modified solution and still get the original $c$ value.

I'm still unsure about the math behind this. So, I messed around with the find_other_solutions.py script for a while and eventually found that subtracting $\frac{(p-1)(q-1)}{4}$ from the computed key worked successfully and got the flag.

### Simpler Solution

This method is not explained since it should be somewhat understandable if you read through the above method.

First, use Pollard's p-1 algorithm to calculate `p`

and `q`

. You can use the implementation in factor.py.

Second, run simple_solve.sage, which will print the hexadecimal key, `0x7069636f4354467b31636139333835387d`

, and the flag.

Essentially, all of Mr. Wong's code (splitting the problem into multiple parts and then using the CRT) is unnecessary for this challenge and sagemath's `discrete_log`

function is powerful enough.

### Flag

`picoCTF{1ca93858}`

Last updated