# Affine-Power S-Boxes over Galois Fields with Area-Optimized Logic Implementations 

Christopher A. Wood ${ }^{12}$, Stanisław P. Radziszowski ${ }^{2}$, Marcin Lukowiak ${ }^{3}$<br>${ }^{1}$ Department of Computer Science, UC Irvine<br>${ }^{2}$ Department of Computer Science, Rochester Institute of Technology<br>${ }^{3}$ Department of Computer Engineering, Rochester Institute of Technology


#### Abstract

Cryptographic S-boxes are fundamental in key-iterated substitution permutation network (SPN) designs for block ciphers. As a natural way for realizing Shannon's confusion and diffusion properties in cryptographic primitives through nonlinear and linear behavior, respectively, SPN designs served as the basis for the Advanced Encryption Standard and a variety of other block ciphers. In this work we present a methodology for minimizing the logic resources for $n$-bit affine-power $S$ boxes over Galois fields based on measurable security properties and finding corresponding area-efficient combinational implementations in hardware. Motivated by the potential need for new and larger S-boxes, we use our methodology to find area-optimized circuits for 8 - and 16 -bit S-boxes. Our methodology is capable of finding good upper bounds on the number of XOR and AND gate equivalents needed for these circuits, which can be further optimized using modern CAD tools.


Keywords: S-box design and construction, 16-bit S-boxes, composite Galois fields, S-box combinational circuits

## 1 Introduction

In order to enable area-efficient hardware implementations of block ciphers based on the key-iterated substitution permutation network (SPN) design principle, such as the Advanced Encryption Standard (AES) [13], the logic resources for each component or operation in the algorithm must be reduced as much as feasible. Traditionally, research efforts to minimize such resources have focused on the S-box - the only nonlinear operation in the algorithm. Minimizing the area required for the AES S-box has been the subject of intense research because combinational implementations of this component often consume a majority of the total logic needed for the algorithm. In addition, with the prevalence of side-channel attacks such as DPA and CPA $[22,4]$, combinational designs are necessary for secure implementations that cannot be easily exploited by these attacks.

To aid the implementation of future cryptographic primitives that rely on S-boxes, we present a comprehensive methodology for constructing and implementing cryptographically significant S-boxes with the goal of low-area combinational logic defined in terms of the number of XOR and AND gate requirements. An affine-power S-box $S(x)$ is a composite function $S(x)=A(P(x))$ consisting of a highly nonlinear power mapping $P$ over a binary Galois field and an affine transformation $A$. In developing our methodology we build upon the exhaustive mixed basis approach of Canright [6] and combinational logic minimization techniques of Boyar and Peralta [2] used for the AES S-box. Our methodology is composed of three steps: finding suitable affine-power S-box constructions, programmatically and exhaustively searching for implementation parameters (i.e. subfield decompositions and basis representations) that permit area-optimized circuits, and then efficiently mapping them into technology dependent resources using modern CAD tools.

We applied our methodology to find area-optimized 8- and 16-bit S-boxes over binary Galois fields. Using the affine-inverse construction leveraged by the AES S-box, we exhaustively searched for area-optimized S-boxes over $G F\left(2^{8}\right)$ defined by all 30 irreducible polynomials over $G F(2)$ of degree 8 . Our search produced an 8 -bit S-box with 103 XOR and 36 AND gates using the field polynomial $t(v)=v^{8}+v^{6}+v^{5}+v^{4}+v^{2}+v+1$, surpassing Canright's optimized S-box circuit for the AES, which uses a different field polynomial, by a single XOR gate prior to further logic optimization techniques. We also found new implementation parameters for the AES S-box that yield a reduction in a single XOR gate prior to the application of Boyar and Peralta's logic optimization techniques. In addition, for the 21 smallest irreducible polynomials of degree 16 over $G F(2)$, we found several 16 -bit S-box constructions that have small area footprints. For example, we found a set of implementation parameters that permit an area-optimized circuit composed of 1238 XOR and 144 AND gates. Even smaller gate counts were achieved for other polynomials.

## 2 Related Work

S-boxes in key-iterated SPN algorithms are often constructed as an affine transformation composed of an inverse power mapping over some Galois field, as is the case for the AES. This particular power mapping has many desired cryptographic properties that, in practice, effectively render many known cryptanalysis attacks ineffective. Consequently, much
of the research on low-area implementations for the affine-power S-boxes has focused on minimizing the combinational logic required for the multiplicative inverse calculation over binary Galois fields.

Out of all known methods to compute the multiplicative inverse in $G F\left(2^{n}\right)$, the use of subfield decomposition has been the most effective and accepted technique for implementing low-area circuits. Using composite field arithmetic, the inverse can be computed using the Itoh-Tsujii inversion algorithm [17] or by direct decomposition to bitwise operations over $G F(2)$ [30]. Under the assumption that the elements in a particular field are represented using a normal basis, the number of multiplications required in the Itoh-Tsujii inversion algorithm still yields complex combinational logic for small fields. In comparison, direct decomposition to $G F(2)$ has yielded significantly smaller circuits $[33,34,24,6,26,27,3,2]$.

To date, the smallest AES S-box using composite field arithmetic and other combinational logic minimization techniques is due to Boyar and Peralta, who found an implementation that required only $83 \mathrm{XOR} / \mathrm{XNOR}$ and 32 AND gates [2]. This fell below the previous area record of 104 XOR and 36 AND gate count by Canright in 2005 [6], which was found by trying all mixed basis representations of the field $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ to reduce the cost of relevant arithmetic when computing the multiplicative inverse and then factoring all basis change matrices needed to map between $G F\left(2^{8}\right)$ and $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$. Boyar and Peralta improved upon Canright's results by swapping his $G F\left(\left(2^{2}\right)^{2}\right)$ inversion circuit for their own optimized version and then performing subsequent combinational logic minimization on the entire S-box circuit. Given the effectiveness of this two-phase approach, we model our methodology after both of them.

## 3 Quantified Security of S-Boxes

With the continued improvement of cryptanalytic attacks that include different forms of linear and differential cryptanalysis [23,1] and algebraic analysis $[8,10]$, among many others $[16,18,21]$, it is critically important that cryptographic S-boxes do not exhibit weaknesses that can be exploited by these attacks. For example, linear cryptanalysis of SPN-based block ciphers exploits the existence of some linear combinations of input and output bits in the substitution step that occur with high (or low) probability [23]. Therefore, highly nonlinear S-boxes are ideal to reduce the probability that such linear combinations can be found and effectively exploited. To determine if an S-box has this property, we may compute the nonlinearity $\mathcal{N}_{l}(S)$ of a particular S-box mapping $S: G F(2)^{n} \rightarrow G F(2)^{m}$
as:

$$
\begin{equation*}
\mathcal{N}_{l}(S)=\min _{c \in S_{2}^{m}}\left\{\mathcal{N}_{l}(c \cdot S)\right\}=\min _{c \in S_{2}^{m}}\left\{\mathcal{N}_{l}\left(c_{0} f_{0} \oplus c_{1} f_{1} \oplus \cdots \oplus c_{m-1} f_{m-1}\right)\right\} \tag{1}
\end{equation*}
$$

where

$$
\begin{equation*}
\mathcal{N}_{l}(f)=2^{n-1}-\frac{1}{2} \max _{u \in G F(2)^{n}}\left|W_{f}(u)\right| \tag{2}
\end{equation*}
$$

is the nonlinearity of a Boolean function $f: G F(2)^{n} \rightarrow G F(2), f_{i}$ are the $m$ coordinate functions of $S$ for $i=1, \ldots, m$, is the inner product operator of two Boolean functions, and $W_{f}(u)=\sum_{x \in G F(2)^{n}}(-1)^{f(x) \oplus(u \cdot x)}$ is the Walsh transform of $f$ with respect to the input function $u$ [12]. Other cryptographic metrics of interest include: the maximum and minimum entries of the linear and difference distribution table $[23,1], \delta$-differential uniformity $[28,29]$, resiliency and correlation immunity [7], component algebraic immunity [7], XL and XLS algebraic immunity [25, 8], interpolation polynomial algebraic complexity [18, 13], and branch number [13].

All of these metrics can be computed within a reasonable amount of time for single $n$-bit S-boxes over fields $G F\left(2^{n}\right)$ when $n \leq 16$. For example, directly computing the differential uniformity can be done in $\mathcal{O}\left(2^{3 n}\right)$ time, which is feasible for a single 16-bit S-box. This complexity, however, prohibited such computations for all 16-bit S-boxes considered in this work. By computing these metrics for S -box candidates we may quantitatively compare the security of different constructions. In general, we seek to build S-boxes that have high nonlinearity, low differential uniformity, high resiliency, high algebraic immunity, and high algebraic complexity. We focus on these metrics when selecting possible S-box constructions in the first step of our methodology. In this step we find an affine transformation that can be composed of a suitable power mapping for the S-box. We describe this procedure in the following section.

## 4 Constructing Suitable S-Boxes

Our focus is on S-boxes built from power mappings over Galois fields, i.e. functions of the form $S(x)=x^{d}$ where $x \in G F\left(2^{n}\right)$ and $0 \leq d<2^{n}$. While other possible constructions exist, such as those based on the welldefined cryptographic properties of Boolean functions, there are several limitations that make these difficult to use in practice. For instance, they typically do not have compact algebraic expressions, which implies that hardware and software implementations often use lookup tables for such mappings. This may be acceptable for small $n$-bit S-boxes (i.e. $n \leq 8$ ), but not for $n \geq 16$.

Power mappings of the form $S(x)=x^{d}$ over the field $G F\left(2^{n}\right)$ are typically classified by their exponents $d$. The known exponents of power mappings over $G F\left(2^{n}\right)$ for even $n$ with substantially high nonlinearity and good differential uniformity properties are shown in Table 1.

Table 1: Cryptographically-significant power mappings.

| Name | Exponent $(\mathrm{d})$ | Ref. |
| :---: | :---: | :---: |
| Inverse | $-1 \equiv 2^{n}-2$ | $[29]$ |
| Gold | $2^{k}+1, \operatorname{gcd}\{k, n\}=1$ for some $1 \leq k \leq 2^{n}-1$ | $[9]$ |
| Kasami | $2^{2 k}-2^{k}+1, \operatorname{gcd}\{k, n\}=1$ for some $1 \leq k \leq n / 2$ | $[9]$ |
| Dobertin | $2^{4 k+3 k+2 k+k}-1$ over $G F\left(2^{n}\right)$ with $n=5 k$ | $[9]$ |
| Niho | $2^{m}+2^{m / 2}-1$ over $G F\left(2^{n}\right)$ with $n=2 m+1$ and $m$ even | $[9]$ |
|  | $2^{m}+2^{(3 m+1) / 2}-1$ over $G F\left(2^{n}\right)$ with $n=2 m+1$ and $m$ odd |  |
| Welch | $2^{m}+3$ over $G F\left(2^{n}\right)$ with $n=2 m+1$ | $[9]$ |

Since these S-boxes are intended for block ciphers, it is natural to impose the additional requirement that they are bijective. By Fermat's Little Theorem it is easy to see that this only occurs when $\operatorname{gcd}\left\{d, 2^{n}-1\right\}=$ 1. Interestingly, with this restriction and the constraint $n=m=16$, many of the possible values for $d$ are discarded and only the inverse exponent remains. See [35] for a proof of this claim.

Although these power mapping exponents are very well studied in the literature, we did not settle with them as the only candidates. In fact, for 8 -bit S-boxes, we exhaustively computed the nonlinearity and $\delta$-differential uniformity of all power mappings over $G F\left(2^{8}\right)$ using the AES field polynomial to determine if there exists suitable candidates that could be studied further. We found that there are only 8 distinct power mapping exponents and inverses $\left(d, d^{-1}\right)$ such that $\delta=4$ and $\mathcal{N}_{L}=$ 112, the optimal values for such power mappings: $(127,253),(191,251)$, $(223,247),(239,239),(247,223),(251,191),(253,127),(254,254) . W e ~ d i d$ not perform similar computations for 16 -bit S-boxes, simply because these computations are much more expensive. Also, since it is not necessary that an S-box be invertible (i.e. if the block cipher using the S-box is operated in CTR-mode), then we need to only find area-optimized circuits for either $d$ or $d^{-1}$.

After finding a candidate S-box construction with reasonable nonlinearity and $\delta$-differential uniformity, our next task was to modify the constructions to increase the algebraic complexity (note that the algebraic

```
Input: \(S(\cdot), S^{-1}(\cdot), G F\left(2^{n}\right), n, d\)
Output: M, c
done := False
repeat
    M := RandomMatrix \((G F(2), n, n)\)
    \(c:=\) RandomElement \(\left(G F\left(2^{n}\right)\right)\)
    if \(\operatorname{det}(\mathbf{M}) \neq 0\) then
        valid := True
        ForwardPairs, InversePairs := [ ]
        for each \(x \in G F\left(2^{n}\right)\) do
            if \(x\) is not a fixed point then
                \(z:=\mathbf{M} x^{d}+c\)
                ForwardPairs :=Append(ForwardPairs, \((x, z)\) )
                \(x^{\prime}:=\left(\mathbf{M}^{-1}(z+c)\right)^{d^{-1}}\)
                InversePairs :=Append(InversePairs, \(\left.\left(z, x^{\prime}\right)\right)\)
            end
            else
                valid := False
                break
            end
        end
        if valid \(=\) True then
            \(p(y):=\) Interpolate(ForwardPairs)
            \(p^{-1}(y):=\) Interpolate(InversePairs)
            if \(\# p(y)>n\) and \(\# p^{-1}(y)>n\) then
                return M, \(c\)
            end
        end
    end
until done \(=\) False
```

Algorithm 1: Probabilistic affine transformation search procedure, where $\# p(y)$ (resp. $\left.\# p^{-1}(y)\right)$ is the number of terms in the interpolation polynomial $p(y)$ (resp. $\left.p^{-1}(y)\right)$.
expression of an interpolation polynomial for an S-box defined solely by a power mapping consists of a single term). In order to avoid interpolation attacks, such expressions should have more terms (be more complex). Perhaps the most common technique for increasing the complexity is to compose an affine transformation of one such power mapping. Cui and Cao [11] proved that the algebraic complexity for any affine-power S-box over $G F\left(2^{n}\right)$ is bounded by $n+1$. Algorithm 1 presents a probabilistic procedure to search for an appropriate affine transformation for affine-power S-boxes, characterized by a matrix $\mathbf{M}$ and constant vector $c$. Using the same rationale for the affine transformation selection presented by Daemen and Rijmen in [14], this procedure searches for affine transformations
that have a "complex algebraic expression if combined with the inverse mapping" and, together with the inverse operation, have "no fixed points and no opposite fixed points." In this context, a fixed point or opposite fixed point occurs when there exists an element $x \in G F\left(2^{n}\right)$ such that $S(x) \oplus x=0^{n}$ or $S(x) \oplus x=1^{n}$. Since there are no known attacks that exploit the existence of fixed points, we opted to lift this constraint if the pair $\mathbf{M}$ and $c$ provide more opportunities for logic optimization than pairs that do not yield any fixed points.

## 5 Searching for Area-Efficient Tower Field Constructions

There are a variety of isomorphic representations for the fields $G F\left(2^{8}\right)$ and $G F\left(2^{16}\right)$. Using composite arithmetic to compute the multiplicative inverse requires arithmetic operations such as addition, multiplication, squaring, and scaling (i.e. multiplication by a constant) in the subfields. The complexity of such arithmetic heavily depends on the representation of elements in the subfields. Polynomial arithmetic is generally more computationally efficient with polynomials of a smaller degree. This can be shown by deriving the expressions for the arithmetic operations in these subfields. For example, given an element $\epsilon \in G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ (where $r(x)=x^{2}+x+\Pi$ defines $\left.G F\left(\left(2^{2}\right)^{2}\right)\right)$ represented in a polynomial basis $[1, X]$ with subfield coefficients $\delta_{1}$ and $\delta_{2}, \epsilon^{-1}$ can be computed as

$$
\epsilon^{-1}=\delta_{1}\left(\delta_{2}^{2}+\delta_{1} \delta_{2}+\delta_{1}^{2} \Pi\right)^{-1} x+\left(\delta_{1}+\delta_{2}\right)\left(\delta_{2}^{2}+\delta_{1} \delta_{2}+\delta_{1}^{2} \Pi\right)^{-1}
$$

If $\epsilon$ is represented in a normal basis $\left[X, X^{16}\right]$, the expression becomes

$$
\left.\epsilon_{1}^{-1}=\left(\left(\delta_{1} \delta_{2}+\left(\delta_{1}+\delta_{2}\right)^{2} \Pi\right)^{-1} \delta_{2}\right) x^{16}+\left(\delta_{1} \delta_{2}+\left(\delta_{1}+\delta_{2}\right)^{2} \Pi\right)^{-1} \delta_{1}\right) x .
$$

Deriving a general expression for inversion in $G F\left(\left(2^{2}\right)^{4}\right)$ depends on numerous factors, including the coefficients of the polynomial $r(x)$ and the basis representation. Given the numerous possibilities, we omit such derivations here, but it should be intuitively clear that the higher-degree polynomials representing elements in $G F\left(\left(2^{2}\right)^{4}\right)$ will lead to less compact expressions than the simple quadratic extension case in which we can always find an irreducible polynomial $r(x)$ with a unit $x$ coefficient. Consequently, we focus on the tower fields $\left.G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)^{2}\right)$ and $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ for $G F\left(2^{16}\right)$ and $G F\left(2^{8}\right)$, respectively. Using such isomorphic representations, the cost of all arithmetic operations with respect to the subfields using a polynomial and normal basis is given in Table 2.

Table 2: Cost of arithmetic in $G F\left(\left(q^{2}\right)^{2}\right)$ with respect to subfield $G F\left(q^{2}\right)$ (A)ddition, (M)ultiplication, (Sq)uare, (I)nversion, (Sc)ale, and (SS)quare-scale operations for polynomial and normal basis representations.

| Operation | Polynomial Basis | Normal Basis |
| :---: | :---: | :---: |
| Inverse | $3 M+2 A+1 I+1 S S$ | $3 M+2 A+I+1 S S$ |
| Add | $2 A$ | $2 A$ |
| Multiply | $3 M+4 A+1 S c$ | $3 M+4 A+1 S c$ |
| Square | $2 S q+1 S c+1 A$ | $3 A+2 S q+1 S c$ |

To this end, let $t(v)$ be a degree 16 (8) irreducible polynomial over $G F(2)$ for $G F\left(2^{16}\right)$ (analogously $G F\left(2^{8}\right)$ ), $s(y)=y^{2}+\Psi y+\Lambda$, be the irreducible polynomial for $\left.G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)^{2}\right), r(x)=x^{2}+\Theta x+\Pi$ be the irreducible polynomial for $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right), q(w)=w^{2}+\Omega w+\Sigma$ be the irreducible polynomial for $G F\left(\left(2^{2}\right)^{2}\right)$, and finally $p(v)=v^{2}+v+1$ be the only irreducible polynomial for $G F\left(2^{2}\right)$. We enforce $\Psi=\Theta=\Omega=1$ to simplify field arithmetic. Also, we denote by $V, W, X$, and $Y$ roots of the polynomials for the fields $G F\left(2^{2}\right), G F\left(\left(2^{2}\right)^{2}\right), G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$, and $G F\left(\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)^{2}\right)$, respectively, and refer to the forward and inverse basis change matrices needed to map elements from $G F\left(2^{8}\right)$ and $G F\left(2^{16}\right)$ to their isomorphic tower field partners as $\mathbf{T}$ and $\mathbf{T}^{-1}$.

Each irreducible polynomial for the fields $G F\left(2^{2}\right), \ldots, G F\left(\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)^{2}\right)$ will have two distinct conjugate roots, which we denote as the sets $\left\{V, V^{2}\right\}$, $\left\{W, W^{4}\right\},\left\{X, X^{16}\right\}$, and $\left\{Y, Y^{256}\right\}$. A polynomial basis for any field can be formed by selecting one of these roots as a basis element in conjunction with the identity element 1 , e.g. $[1, V]$ or $\left[1, V^{2}\right]$ for $G F\left(2^{2}\right)$, whereas a normal basis requires that both roots are used. For each possible combination of basis elements we then programmatically determine the combinational complexity of subfield arithmetic needed to compute the inverse.

For each combination of basis elements we also perform several arithmetic and logic optimizations. For instance, as Satoh [34] mentions, it is possible to save on the number of gates required for a circuit if there exists two $G F\left(\left(2^{m}\right)^{2}\right)$ multipliers that have a shared input. This is because both the polynomial and normal multipliers need to compute the sum of the two coefficients for the input elements, as shown in Figure 1. Therefore, every shared input factor will save one addition in the subfield. In addition, polynomial and normal multipliers for elements in $\operatorname{GF}\left(\left(2^{2}\right)^{2}\right)$ and $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ each have three subfield multipliers that will share a common factor, thus saving additional sub-subfield addition operations.


Fig. 1: XOR gate reduction for two $G F\left(\left(2^{2}\right)^{2}\right)$ multipliers with a shared input $B$. The single XOR gate is saved by not recomputing the sum of the two $B$ coefficients $B_{1}$ and $B_{2}$ of when $B$ is represented in a normal basis.

We also make use of the optimizations to the square-scale operations performed by Canright [6]. At a high level, such optimizations are used to derive compact expressions for the square-scale operations given particular values of $\Pi$, which can only take a fixed number of values in order to make $r(x)$ irreducible over $G F\left(\left(2^{2}\right)^{2}\right)$. We refer the reader to [35] and [6] for further discussion of these optimizations.

Our S-box construction program written in Magma [5] does not support exhaustive common subexpression elimination. This is primarily due to the fact that Magma does not support normal basis representations for finite field elements. Furthermore, exhaustively searching for all common subexpressions in all 432 possible inversion and square-scale algebraic expressions over $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ was outside the scope of this work. Future work will explore programmatically deriving such compact expressions in order to achieve lower gate counts. Also, it is important to note that, because we do not automatically apply the full set of Canright's optimizations, our gate counts will be upper bounds on the total number of gates. That is, the software that was written to count the number of gates for each field representation and basis selection will produce a result that is larger than or equal to what is presented in Canright's work, and as


Fig. 2: High-level diagram for a merged S-box circuit. The sel signal is used to toggle encryption and decryption modes.
shown in his detailed report, other optimizations can be applied to lower this bound even further. After the tower field implementation parameters have been identified, we then utilize the logic minimization techniques of Paar [31] and Boyar and Peralta [3] to reduce the XOR gate count for the basis change matrices, which are merely linear mappings represented as straight-line programs (SLPs). An SLP for a binary matrix-vector multiplication expression is a finite sequence of lines of the form $u:=\lambda v+\mu w$, where $\lambda$ and $\mu$ are elements in $G F(2), u, v$, and $w$ are variables, and some lines are output of the corresponding multiplication.

In addition to these algebraic and gate-level optimizations, we also follow in the footsteps of Satoh [34] and Canright [6] by performing logic minimizations on merged S-box designs. The merged S-box design simply pairs the forward and inverse S-box operations into the same circuit that use the same inversion component, where the output is determined by a simple multiplexer. A high-level overview of the merged circuit is shown in Figure 2. We optimize the matrices $\mathbf{T}^{-1} /(\mathbf{M T})^{-1}$ and $\mathbf{M T} / \mathbf{T}$ separately.

## 6 New S-Box Constructions and Implementations

We measure the complexity, or cost, of a particular S-box as the total number of XOR and AND gates required in a combinational circuit implementation. To determine this cost for merged S-box circuits we measure the cost of the basis transformation matrices $\mathbf{T}$ and $\mathbf{T}^{-1}$ merged with the affine transformation matrices $\mathbf{M}$, the cost of a single inversion circuit, and the weight of the affine constant $c$. For fixed $\mathbf{M}$ and $c$, we perform an exhaustive search over all mixed basis representations of the S-box field
to find an upper bound on the gates required for the inversion circuit. We then use the logic optimization technique of Boyar and Peralta [2] to reduce the number of XOR gates required for the merged basis change and affine transformation matrices.

For 16 -bit S-boxes, there are 128 choices for $s(y)$, eight choices for $r(x)$, two choices for $q(w)$, and only one choice for $p(v)$ that have a trace of unity. Since each of these polynomials has two distinct conjugate roots that can be used to represent the respective field elements with a polynomial or normal basis, there are exactly three basis element combinations in all degree-two extension subfields of $G F\left(2^{16}\right)$. Consequently, there is a total of 165888 possible cases to consider for a single polynomial $t(v)$. Since the basis change matrices depend on the representation of $G F\left(2^{16}\right)$, and there are 4080 candidates for $t(v)$, this means that we must consider about $6 \times 10^{8}$ possible cases to find a minimal transformation. Due to computational limitations, we selectively focused on the 21 smallest $t(v)$ polynomials when searching for 16-bit S-box implementation parameters. For 8 -bit S-boxes, there are only 30 candidate $s(v)$ polynomials with smaller basis change matrices, so we did not have to impose a similar computational restriction.

We applied our methodology to find 8-bit S-boxes over $G F\left(2^{8}\right)$ and new 16 -bit S-boxes over $G F\left(2^{16}\right)$. For the 8 -bit S-box case, we used Canright's optimized $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ inversion circuit when exhaustively searching for suitable implementation parameters. To perform this search, we consider all inverters which have a normal basis for $G F\left(\left(2^{4}\right)^{2}\right)$ because the shared multiplication factor saves 5 XOR gates over inverters with a polynomial basis for $G F\left(\left(2^{4}\right)^{2}\right)$. After Canright's optimizations, these S-boxes have anywhere from 66 to 68 XOR gates and 36 AND gates for the inverter [6]. Since the $G F\left(2^{8}\right)$ irreducible polynomial determines the number of XOR gates required for the basis change matrices $\mathbf{T}$ and $\mathbf{T}^{-1}$, we then considered all 30 degree 8 irreducible polynomials for $G F\left(2^{8}\right)$ to derive such basis change matrices. For each candidate inversion circuit and pair of basis change matrices $\mathbf{T}$ and $\mathbf{T}^{-1}$, we then applied the linear circuit minimization heuristic described by Boyar and Peralta in [3] to reduce the required XOR gates. This procedure was repeated for each irreducible polynomial $t(v)$ for $G F\left(2^{8}\right)$ and the basis representation that yielded the smallest number of required XOR and AND gates was recorded. Our results from this experiment for merged S-box designs are summarized in Table 2 in Appendix A.

We were able to improve upon Canright's S-box design using the AES polynomial by a single XOR gate, before logic gate optimizations such as using NAND/NOR instead of AND/XOR gates. With the same normal bases and coefficients $\Pi$ and $\Sigma$, we found a different embedding of $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ into $G F\left(2^{8}\right)$ that yielded merged basis change and affine transformation matrices able to be implemented in only 37 XOR gates, as opposed to 38 found by Canright (see Figure 3 for the basis change matrices and corresponding SLP for proof). This single gate is saved in our field isomorphism and by applying Boyar and Peralta's optimization technique for the merged matrices $\mathbf{T}^{-1} /(\mathbf{M T})^{-1}$ and $\mathbf{M T} / \mathbf{T}$.

Out of all 30 degree 8 irreducible polynomials over $G F(2)$, we found that $t(v)=v^{8}+v^{6}+v^{5}+v^{4}+v^{2}+v+1$ permitted an S-box circuit with the smallest area requirement of only 103 XOR and 36 AND gates (see Table 3). Using this selection of $t(v)$, the basis change matrices to map an element $\alpha \in G F\left(2^{8}\right)$ represented in a polynomial basis to $\beta \in$ $G F\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)$ represented with the bases $[1, V],\left[W, W^{4}\right]$, and $\left[X, X^{16}\right]$, where this tower field uses the coefficients $\Sigma=v$ and $\Pi=(v+1) w^{4}+w$, require at most 35 XOR gates in the merged S-box design (see the SLP in Figure 4 for proof). Further area improvements for this S-box are likely possible by applying Boyar and Peralta's SLP minimization techniques, in addition to CAD-driven optimizations. However, even in its current state, this design surpasses Canright's optimized circuit for the AES S-box, and as such may be of value for implementations of future cryptographic algorithms.

We then considered the 21 smallest degree 16 irreducible polynomials $t(v)$ over $G F(2)$ in search for area-optimized 16 -bit S-boxes. This search yielded several S-box constructions with small gate counts prior to (linear) logic optimizations of the basis change matrices. For the smallest irreducible polynomial $t(v)=v^{16}+v^{5}+v^{3}+v+1$, we found a set of implementation parameters that permitted a circuit with a total of 1238 XOR and 144 AND gates. This candidate, shown in Figure 5, uses the basis sets $[1, V],[1, W],[1, X],\left[Y^{256}, Y\right]$ to represent elements in $G F\left(\left(\left(\left(2^{2}\right)^{2}\right)^{2}\right)^{2}\right)$ and its respective subfields, where $\Sigma=v, \Pi=v w+v$, and $\Lambda=(v w+v) x+w$. The affine transformation and basis change matrices used to obtain the circuit are shown in Figure 5. A larger subset of these constructions are shown in Table 4 of Appendix A.

$$
\mathbf{T}^{-1}=\left(\begin{array}{llllllll}
1 & 1 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 1 & 1 & 1 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\
1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 1 & 1 & 0 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 & 1
\end{array}\right)
$$

$$
\mathbf{T}=\left(\begin{array}{llllllll}
0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \\
1 & 1 & 1 & 0 & 1 & 0 & 1 & 1 \\
1 & 1 & 1 & 0 & 1 & 1 & 0 & 1 \\
0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 \\
1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0
\end{array}\right)
$$

| Forward SLP for $\mathbf{T}^{-1} /(\mathbf{M T})^{-1}$ | Inverse SLP for MT/T |
| :---: | :---: |
| 1) $y_{5}=x_{7}$ <br> 2) $t_{8}=x_{1}+x_{7}$ <br> 19) $t_{18}=x_{2}+t_{15}$ <br> 3) $t_{9}=x_{2}+t_{8}$ <br> 20) $y_{13}=t_{18}$ <br> 4) $y_{6}=t_{9}$ <br> 21) $t_{19}=x_{3}+t_{9}$ <br> 5) $t_{10}=x_{6}+t_{9}$ <br> 22) $y_{1}=t_{19}$ <br> 6) $y_{2}=t_{10}$ <br> 23) $t_{20}=x_{3}+t_{10}$ <br> 7) $t_{11}=x_{0}+x_{3}$ <br> 24) $y_{15}=t_{20}$ <br> 8) $y_{8}=t_{11}$ <br> 9) $t_{12}=x_{2}+t_{10}$ <br> 26) $y_{9}=t_{21}$ <br> 10) $t_{13}=x_{4}+t_{12}$ <br> 27) $t_{22}=x_{5}+t_{13}$ <br> 11) $y_{11}=t_{13}$ <br> 28) $y_{7}=t_{22}$ <br> 12) $t_{14}=x_{1}+t_{11}$ <br> 29) $t_{23}=t_{10}+t_{15}$ <br> 13) $y_{12}=t_{1} 4$ <br> 30) $y_{0}=t_{23}$ <br> 14) $t_{15}=x_{0}+x_{5}$ <br> 31) $t_{24}=t_{13}+t_{14}$ <br> 15) $t_{16}=x_{0}+t_{9}$ <br> 32) $y_{4}=t_{24}$ <br> 16) $y_{3}=t_{16}$ <br> 33) $t_{25}=x_{0}+x_{6}$ <br> 17) $t_{17}=x_{0}+t_{14}$ <br> 34) $t_{26}=t_{24}+t_{25}$ <br> 18) $y_{10}=t_{17}$ <br> 35) $y_{14}=t_{26}$ | 1) $y_{15}=x_{5}$ <br> 18) $t_{17}=x_{0}+t_{12}$ <br> 2) $t_{8}=x_{2}+x_{4}$ 19) $y_{13}=t_{17}$ <br> 3) $y_{0}=t_{8}$ <br> 20) $t_{18}=x_{1}+x_{6}$ <br> 4) $t_{9}=x_{3}+x_{6}$ 21) $y_{11}=t_{18}$ <br> 5) $y_{8}=t_{9}$ <br> 22) $t_{19}=t_{16}+t_{18}$ <br> 6) $t_{10}=x_{1}+t_{8}$ <br> 23) $t_{20}=x_{1}+x_{7}$ <br> 7) $t_{11}=x_{5}+t_{10}$ <br> 24) $y_{2}=t_{20}$ <br> 8) $t_{12}=x_{2}+t_{9}$ <br> 25) $t_{21}=x_{1}+t 9$ <br> 9) $y_{6}=t_{12}$ <br> 26) $y_{7}=t_{21}$ <br> 10) $t_{13}=x_{7}+t_{11}$ <br> 27) $t_{22}=x_{2}+x_{6}$ <br> 11) $y_{5}=t_{13}$ <br> 28) $y_{14}=t_{22}$ <br> 12) $t_{14}=x_{0}+t_{13}$ <br> 29) $t_{23}=x_{7}+t_{19}$ <br> 13) $y_{10}=t_{14}$ <br> 30) $y_{9}=t_{23}$ <br> 14) $t_{15}=x_{0}+x_{4}$ <br> 31) $t_{24}=t_{9}+t_{11}$ <br> 15) $y_{1}=t_{15}$ <br> 32) $y_{12}=t_{24}$ <br> 16) $t_{16}=x_{0}+t_{8}$ <br> 33) $t_{25}=t_{9}+t_{19}$ <br> 17) $y_{3}=t_{16}$ <br> 34) $y_{4}=t_{25}$ |

Fig. 3: Forward SLP for $\mathbf{T}^{-1} /(\mathbf{M T})^{-1}$ and inverse SLP for $\mathbf{M T} / \mathbf{T}$ for use in Canright's design of the AES S-box [6]. Collectively, they require 37 XOR gates to implement.

$$
\left.\begin{array}{c}
z=S(x)=\left(\begin{array}{llllllll}
0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 \\
1 & 1 & 0 & 0 & 1 & 0 & 1 & 1 \\
0 & 0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 & 1 \\
1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 & 1 & 0 & 1 & 0
\end{array}\right)\left(\begin{array}{l}
y_{7} \\
y_{6} \\
y_{5} \\
y_{4} \\
y_{3} \\
y_{2} \\
y_{1} \\
y_{0}
\end{array}\right)+\left(\begin{array}{l}
0 \\
0 \\
0 \\
0 \\
1 \\
0 \\
0 \\
0
\end{array}\right) \quad x=y^{-1}=\left(S^{-1}(z)\right)^{-1}=\left(\begin{array}{llllllll}
1 & 1 & 0 & 0 & 1 & 0 & 1 & 1 \\
1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 \\
0 & 1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 1 & 0 \\
1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 \\
1 & 0 & 0 & 1 & 1 & 0 & 1 \\
0 & 1 & 0 & 1 & 0 & 0 & 1 & 1 \\
1 & 0 & 0 & 0 & 0 & 0 & 1 & 1
\end{array}\right)\left(\begin{array}{c}
z_{7} \\
z_{6} \\
z_{5} \\
z_{4} \\
z_{3} \\
z_{2} \\
z_{1} \\
z_{0}
\end{array}\right) \\
\mathbf{T}=\left(\begin{array}{lllllll}
1 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right) \\
0
\end{array}\right)
$$

| Forward SLP for $\mathbf{T}^{-1} /(\mathbf{M T})^{-}$ | Inverse SLP for MT/T |
| :---: | :---: |
| 1) $y_{0}=x_{6}$ <br> 17) $y_{2}=t_{15}$ <br> 2) $y_{9}=x_{2}$ <br> 18) $t_{16}=x_{2}+x_{7}$ <br> 3) $t_{8}=x_{0}+x_{4}$ <br> 19) $y_{5}=t_{16}$ <br> 4) $t_{9}=x_{6}+t_{8}$ <br> 20) $t_{17}=t_{8}+t_{16}$ <br> 5) $y_{11}=t_{9}$ <br> 21) $t_{18}=x_{1}+t_{11}$ <br> 6) $t_{10}=x_{1}+x_{7}$ <br> 22) $y_{6}=t_{18}$ <br> 7) $t_{11}=x_{5}+t_{9}$ <br> 23) $t_{19}=x_{3}+t_{9}$ <br> 8) $y_{4}=t_{11}$ <br> 24) $y_{12}=t_{19}$ <br> 9) $y_{15}=t_{11}$ <br> 25) $t_{20}=x_{6}+t_{12}$ <br> 10) $t_{12}=x_{3}+t_{10}$ <br> 26) $y_{8}=t_{20}$ <br> 11) $t_{13}=x_{4}+t_{12}$ <br> 27) $t_{21}=t_{10}+t_{17}$ <br> 12) $y_{7}=t_{13}$ <br> 28) $y_{13}=t_{21}$ <br> 13) $y_{14}=t_{13}$ <br> 29) $t_{22}=t_{11}+t_{17}$ <br> 14) $t_{14}=x_{0}+t_{10}$ <br> 30) $y_{1}=t_{22}$ <br> 15) $y_{3}=t_{14}$ <br> 31) $t_{23}=t_{14}+t_{15}$ <br> 16) $t_{15}=x_{2}+t_{9}$ <br> 32) $y_{10}=t_{23}$ | 1) $y_{2}=x_{1}$ <br> 2) $y_{14}=x_{0}$ <br> 19) $t_{18}=x_{4}+t_{11}$ <br> 3) $t_{8}=x_{0}+x_{1}$ <br> 20) $y_{7}=t_{18}$ <br> 4) $t_{9}=x_{2}+x_{4}$ <br> 21) $t_{19}=x_{2}+t_{18}$ <br> 5) $t_{10}=x_{3}+x_{6}$ <br> 22) $y_{10}=t_{19}$ <br> 6) $t_{11}=x_{5}+t_{8}$ <br> 23) $t_{20}=x_{3}+t_{12}$ <br> 7) $y_{13}=t_{11}$ <br> 24) $y_{4}=t_{20}$ <br> 8) $t_{12}=t_{8}+t_{9}$ <br> 9) $y_{15}=t_{12}$ <br> 10) $t_{13}=x_{0}+t_{10}$ <br> 11) $y_{0}=t_{13}$ <br> 12) $t_{14}=t_{12}+t_{13}$ <br> 13) $y_{6}=t_{14}$ <br> 14) $t_{15}=x_{3}+x_{4}$ <br> 15) $y_{3}=t_{15}$ <br> 16) $t_{16}=x_{6}+t_{12}$ <br> 33) $t_{25}=x_{0}+t_{17}$ <br> 17) $t_{17}=x_{3}+x_{7}$ <br> 34) $t_{26}=t_{18}+t_{25}$ <br> 18) $y_{5}=t_{17}$ <br> 35) $y_{11}=t_{26}$ |

Fig. 4: The 8-bit S-box and basis change matrices for polynomial $t(v)=$ $v^{8}+v^{6}+v^{5}+v^{4}+v^{2}+v+1$. The vector $y$ is the inverse of the element $x$ in $G F\left(2^{8}\right)$ (or $\overline{0}$ if $x=0$ ). Accordingly, the output $y=S^{-1}(z)$ is inverted in the same way to obtain the original element $x$. The forward S-box SLP for $\mathbf{T}^{-1} /(\mathbf{M T})^{-1}$ and inverse S-box SLP for $\mathbf{M T} / \mathbf{T}$ are also shown, which collectively require 35 XOR gates to implement.


$$
\mathbf{T}=\left(\begin{array}{llllllllllllllll}
0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0
\end{array}\right)
$$



Fig. 5: The 16 -bit S-box and basis change matrices for the polynomial $t(v)=v^{16}+v^{5}+v^{3}+v+1$. The vector $y$ is the inverse of the element $x$ in $G F\left(2^{16}\right)$ (or $\overline{0}$ if $x=0$ ). Accordingly, the output $y=S^{-1}(z)$ is inverted in the same way to obtain the original element $x$.

## 7 Conclusion

In this work we presented a comprehensive methodology for identifying cryptographically significant S-box constructions based on power mappings over $G F\left(2^{n}\right)$ and searching for composite-field representations that permit low-area hardware implementations. We applied our technique to 8-bit S-boxes defined over $G F\left(2^{8}\right)$ using all 30 degree 8 irreducible polynomials and found several circuits with area-optimized implementations on par with or surpassing the AES equivalent (pending CAD optimizations). Motivated by a potential need for larger S-boxes, we then scaled up our procedure to 16 -bit S-boxes. We believe this methodology and our results may be useful in the design of future cryptographic algorithms.

## References

1. Eli Biham and Adi Shamir. Differential Cryptanalysis of DES-Like Cryptosystems. Journal of Cryptology 4.1 (1991), 3-72.
2. Joan Boyar and René Peralta. A Small Depth-16 Circuit for the AES S-Box. IFIP Advances in Information and Communication Technology, Springer Berlin Heidelberg 376 (2012), 287-298.
3. Joan Boyar, Philip Matthews, and René Peralta. Logic Minimization Techniques with Applications to Cryptology. Journal of Cryptology (2012), 1-33.
4. Eric Brier, Christophe Clavier, and Francis Olivier. Correlation Power Analysis with a Leakage Model. Cryptographic Hardware and Embedded Systems - CHES 2004, Springer Berlin Heidelberg (2004), 16-29.
5. John Cannon and Allan Steel. The Magma computational algebra system. Software available online (magma.maths.usyd.edu.au) (2005).
6. David Canright. A Very Compact S-Box for AES. CHES 2005-Cryptographic Hardware and Embedded Systems, Springer Berlin Heidelberg (2005), 441-455.
7. Claude Carlet and Emmanuel Prouff. On a New Notion of Nonlinearity Relevant to Multi-output Pseudo-random Generators. Selected Areas in Cryptography, Springer Berlin Heidelberg (2004).
8. Nicolas T. Courtois and Josef Pieprzyk. Cryptanalysis of block ciphers with overdefined systems of equations. Advances in Cryptology - ASIACRYPT 2002, Springer Berlin Heidelberg, (2002), 267-287.
9. Nicolas T. Courtois, Blandine Debraize, and Eric Garrido. On Exact Algebraic [Non-]Immunity of S-Boxes Based on Power Functions. Information Security and Privacy, Springer Berlin Heidelberg (2006).
10. Gregory V. Bard, Nicolas T. Courtois, and Chris Jefferson. Efficient Methods for Conversion and Solution of Sparse Systems of Low-Degree Multivariate Polynomials over $G F(2)$ via SAT-Solvers. Presented at ECRYPT workshop Tools for Cryptanalysis (2007).
11. Lingguo Cui and Yuanda Cao. A New S-Box Structure Named Affine-PowerAffine. International Journal of Innovative Computing, Information and Control 3.3 (2007), 751-759.
12. Thomas W. Cusick and Pantelimon Stănică. Cryptographic Boolean Functions and Applications. Academic Press (2009).
13. Joan Daemen and Vincent Rijmen. Advanced Encryption Standard (AES) (FIPS 197). Technical report, Katholijke Universiteit Leuven/ESAT (2001).
14. Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES-the Advanced Encryption Standard. Springer (2002).
15. Hans Dobbertin. Almost Perfect Nonlinear Power Functions on $\operatorname{GF}\left(2^{n}\right)$ : The Welch Case. IEEE Transactions on Information Theory 45(4) (1999), 1271-1275.
16. Henri Gilbert and Thomas Peyrin. Super-Sbox Cryptanalysis: Improved Attacks for AES-like Permutations. Fast Software Encryption, Springer Berlin Heidelberg (2010).
17. Toshiya Itoh and Shigeo Tsujii. A Fast Algorithm for Computing Multiplicative Inverses in $G F\left(2^{m}\right)$ Using Normal Bases. Information and Computation 78.3 (1988), 171-177.
18. Thomas Jakobsen and Lars R. Knudsen. The Interpolation Attack on Block Ciphers. 4 th International Workshop on Fast Software Encryption LNCS, Springer 1267 (1997), pp. 28-40.
19. Jérémy Jean, María Naya-Plasencia, and Thomas Peyrin. Improved Cryptanalysis of AES-like Permutations. Journal of Cryptology (2013), 1-27.
20. Alan Kaminsky, Michael Kurdziel, Stanisław Radziszowski. An Overview of Cryptanalysis Research of the Advanced Encryption Standard. Proceedings of MILCOM'2010, San Jose, CA (2010).
21. Lars R. Knudsen. Truncated and Higher Order Differentials. Fast Software Encryption, Springer Berlin Heidelberg 1008 (1995).
22. Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. Advances in Cryptology - CRYPTO99, Springer Berlin Heidelberg (1999).
23. Mitsuru Matsui. Linear cryptanalysis method for DES cipher. Advances in Cryptology - EUROCRYPT93, Springer Berlin Heidelberg, (1994).
24. Nele Mentens, Lejla Batina, Bart Preneel, and Ingrid Verbauwhede. A Systematic Evaluation of Compact Hardware Implementations for the Rijndael S-Box. Topics in Cryptology-CT-RSA, Springer Berlin Heidelberg (2005), 323-333.
25. Nawaz Yassir, Kishan Chand Gupta, and Guang Gong. Algebraic Immunity of S-Boxes Based on Power Mappings: Analysis and Construction. IEEE Transactions on Information Theory 55.9 (2009), 4263-4273.
26. Svetla Nikova, Vincent Rijmen, and Martin Schläffer. Using Normal Bases for Compact Hardware Implementations of the AES S-box. Security and Cryptography for Networks. Springer Berlin Heidelberg (2008), 236-245.
27. Yasuyuki Nogami, Kenta Nekado, Tetsumi Toyota, Naoto Hongo, and Yoshitaka Morikawa. Mixed Bases for Efficient Inversion in $F_{\left(\left(2^{2}\right)^{2}\right)^{2}}$ and Conversion Matrices of SubBytes of AES. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E94-A:6 (2011), 1318-1327.
28. Kaisa Nyberg. Perfect nonlinear S-boxes. Advances in Cryptology - EUROCRYPT91. Springer Berlin Heidelberg (1991).
29. Kaisa Nyberg. Differentially Uniform Mappings for Cryptography. Advances in Cryptology - Eurocrypt93. Springer Berlin Heidelberg (1994).
30. Christof Paar. Some Remarks on Efficient Inversion in Finite Fields. 1995 IEEE International Symposium on Information Theory (1995).
31. Christof Paar. Optimized Arithmetic for Reed-Solomon Encoders. Proceedings of the 1997 IEEE International Symposium on Information Theory (1997).
32. Vincent Rijmen. Efficient Implementation of the Rijndael S-box. Katholieke Universiteit Leuven, Dept. ESAT, Belgium (2000).
33. Atri Rudra, Pradeep K. Dubey, Charanjit S. Jutla, Vijay Kumar, Josyula R. Rao, and Pankaj Rohatgi. Efficient Rijndael Encryption Implementation with Composite Field Arithmetic. Cryptographic Hardware and Embedded Systems - CHES, Springer Berlin Heidelberg (2001).
34. Akashi Satoh, Sumio Morioka, Kohji Takano, and Seiji Munetoh. A Compact Rijndael Hardware Architecture with S-Box Optimization. Advances in Cryptology ASIACRYPT, Springer Berlin Heidelberg (2001), 239-254.
35. Christopher A. Wood. Large Substitution Boxes with Efficient Combinational Implementations. M.S. Thesis, Computer Science, Rochester Institute of Technology (August 2013).

## A S-Box Constructions

In this appendix we provide detailed parameters for a variety of S-box constructions. A basis $B=\left[\beta^{n_{1}}, \beta^{n_{2}}\right]$ is used to represent an arbitrary element $\alpha \in G F\left(\left(q^{m}\right)^{2}\right)$ as $\alpha=a_{1} \beta^{n_{1}}+a_{2} \beta^{n_{1}}$ for some $a_{1}, a_{2} \in G F\left(q^{m}\right)$. The basis element powers $n_{1}$ and $n_{2}$ are chosen such that $B$ is a polynomial or normal basis. In particular, if $n_{1}=0$, then $B$ must be a polynomial basis where $n_{2} \in\left\{1, q^{m}\right\}$. If $n_{1} \neq 0$ and $n_{2} \neq 0$ then $B$ must be a normal basis. The order of normal basis elements in $B$ depends on how Magma selects primitive elements. Specifically, if $\beta^{q^{m}}$ is chosen as the primitive element then our S-box construction program will fix the basis $B=\left[\beta^{q^{m}}, \beta\right]$.

In the following tables we encode each irreducible polynomial $t(v)$, constant $c$, and binary matrix ( $\mathbf{T}, \mathbf{T}^{-1}$, and $\mathbf{M}$ in row order), which are described in Sections 4 and 5, as a hexadecimal string. $\Sigma, \Pi$, and $\Lambda$, the irreducible polynomial coefficients described in Section 5, are shown with a polynomial basis. We also use the notation $\mathbb{F}!v$ to denote the embedding of $v$ into the field $\mathbb{F}$, where $\mathbb{F}=G F\left(2^{8}\right)$ or $\mathbb{F}=G F\left(2^{16}\right)$ for 8 and 16 bit S-boxes, respectively. The subfield bases are exactly as described in Section 5. Finally, the Inv. and Total fields denote the number of XOR gates required for the multiplicative inverse and merged S-box circuits, respectively.
Table 3: Area-optimized 8-bit S-box constructions for merged circuit implementations using all 30 irreducible poly-
nomials $t(v)$ of degree 8 over $G F(2)$. The entry with polynomial $t(v)=177$ corresponds to the construction given in Figure 4.

|  | $\Sigma$ | $\Pi$ |  |  | \|F! ${ }^{\text {c }}$ | Bases | T | T | M |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 177 | $v$ |  | B7 | 88 | 2 D | $[1, V],\left[W, W^{4}\right],[X$ | F20AEC5DBEC480E8 | 0227AAC18E21CE59 |  |  |  |  |
| 11 D |  |  | D6 | 98 | C4 | $[1, V],[1, W],\left[X, X^{16}\right]$ | 5339882404 D 70232 | 8C76181BACO |  | 8 E | 67 |  |
| 1 CF |  |  | 3 D | ED |  | $[1, V],[1, W],[X$, | OCC82C469FE022AD | 72D6A00BE464A253 | 520B02BAB98646E4 | 81 |  |  |
| 187 |  |  | AB | 74 | 57 | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 03 | 7F51A9F16169F373 |  |  |  |  |
| 1E7 |  | $(v+1) w$ | 84 | 31 | F1 | $[1, V],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 91E4C9417D8AA092 | F8A5FACDC8E734B5 |  |  | , |  |
| 19 F |  | $(v+1) w$ | FA | 23 | 73 | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | A6 | 43E155D129DB4D67 | 438496B8F84D5DEO | 72 | 66 |  |
| 13 |  | (v | 94 | 28 | 15 | $[1, V],\left[1, W^{4}\right],\left[X^{16}, X\right]$ | OACC409B041BE658 | 1420C25D7C08FCD | 011E10437454CDD |  | 67 |  |
| 1 BD |  |  | 26 | 53 | 8D | $\left[1, V^{2}\right],[1, W],\left[X^{16}, X\right]$ | 5D84028CBB93CAD2 | C6B45C53508620B1 | 45E589B90CDA65C |  | 67 |  |
| 1 A9 |  | $(v+1$ | C7 | F6 | 52 | $[1, V],[1, W],\left[X, X^{16}\right]$ | 24F940D16E60791A | 422024A974A4DCD | 83A31CD507280ADB |  | 67 |  |
| 18 |  | $(v+1) w$ | 77 | C1 | F5 | $\left[1, V^{2}\right],\left[1, W^{4}\right],\left[X, X^{16}\right]$ | BF539B1BE6B12823 | 30BEEC13EE4C26CB | 264CF2FC9B8FB78 |  | 67 |  |
| 1 A3 |  | $(v+1) w$ | 29 | BB | 27 | [ $1, V],\left[1, W^{4}\right],\left[X^{16}, X\right]$ | 84 | 40BA223556COF2E1 | E1FAC8996F3023C1 |  |  |  |
| 118 |  |  | BD | 5 | FE | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 12 | E77163E19B0161 | F87C3E1F8FC7E3F |  |  |  |
|  |  |  |  | FA | F5 | $\left[V, V^{2}\right],\left[W, W^{4}\right],[X$, | 4E | 49CF2BCD513B258F | 011E10437454CDD9 |  |  |  |
| 17 |  | (v | 6 C | 7 E |  | $[1, V],[1, W],\left[X^{16}, X\right]$ | 66 | 1 | 4 |  |  |  |
|  |  |  | AD |  | 5 A | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right.$ | 56 | 9F0391BF93EDD12B | 88 |  |  |  |
| 15 F |  |  |  |  |  | $\left[1, V^{2}\right],\left[1, W^{4}\right],[X$, | 590888EC937B02B4 | 60AA86FB401C0291 | 16 |  |  |  |
| 12D |  |  |  | 59 | 71 | $[1, V],[1, W],\left[X, X^{16}\right]$ | 02955F7142644E4 | 5488BA173C36803 | 6C994280381784 |  |  |  |
| 1B1 |  |  | cc | F3 | 98 | $[1, V],\left[1, W^{4}\right],\left[X^{16}, X\right]$ | 3D246A9D1B460 | CE4213 | 711350768 |  |  |  |
|  |  |  | 4 E |  | 90 | $\left.1, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 11D00A938A8FF |  | 18D74F6D8 |  |  |  |
| 12B |  |  | EB | 3D | 3B | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | CF9A81142B398 | 1DED670F511F03 | 3DBEB301 |  |  |  |
| 171 |  |  | DA | F0 | 39 | $[1, V],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 2F8FE194C42802B | 6E5FAE47AA3902BF | B5F1DBE1FC57284F | E8 |  |  |
| $1 \mathrm{F3}$ |  |  | 71 | AA | 26 | $[1, V],\left[W, W^{4}\right],\left[X, X^{16}\right]$ | 72806B022D6 | 40FB2C4722C310 | 19A0 | E9 |  |  |
| 1 1F5 |  | $(v+1) w$ | 50 |  | E3 | $[1, V],\left[1, W^{4}\right],\left[X, X^{16}\right]$ | BB | 82B410BBDC466C19 | 6FDD1A839A7FB39 | F0 | 67 |  |
| 169 |  | $v w+1$ | 7 F | 13 |  | $\left[1, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 88807F8F2ADF | 40EB64FFC03DAC01 | 1D | D7 | 66 |  |
| 139 |  | $v w+v+$ | D4 | 4B | 13 | $[1, V],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | FDAD82467 | $9 E B 3 C C 73041$ DBE0B | E8B | F2 | 66 |  |
| 165 |  | vw | 89 | 73 | FC | $\left[V, V^{2}\right],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | 4782288D 12DD9C | 5B07F713D79D1B01 | 9 A 1 A |  | 66 |  |
| $1 \mathrm{F9}$ |  | vw | co | B2 | 82 | $\left[1, V^{2}\right],\left[1, W^{4}\right],\left[X^{16}, X\right]$ | BD288426C480FB | 0428BA43FA24 | 34026D099E44E22D |  | 68 |  |
| 1DD |  | $v w+v+$ | A1 | 88 | 22 | $[1, V],\left[W, W^{4}\right],\left[X^{16}, X\right]$ | DOD791DDA5 | DC7D3A217E33 | 7DD62EE52D1B32B9 |  |  |  |
| $1 \mathrm{D7}$ |  | $(v+1) w$ | 35 | 73 | 5A | $1, V],\left[W, W^{4}\right],\left[X^{16}, X\right]$ |  | 505304D97EDB3CBD | A0758DF7DEB59BE0 |  | E6 |  |

Table 4: Subset of the smallest area-optimized 16-bit S-box constructions for merged circuit implementations found using our methodology.

| $t(v)$ | 1012F | 1018F | 10175 | 1015D |
| :---: | :---: | :---: | :---: | :---: |
| $\Sigma$ | $v+1$ | $v+1$ | $v$ | $v$ |
| $\Pi$ | $v w+1$ | $v w+v$ | $v w$ | $(v+1) w+1$ |
| $\Lambda$ | $(v w+1) x$ | $(v w+1) x+v w+v+1$ | $((v+1) w+v) x$ | $((v+1) w+1) x$ |
| FF! $v$ | 8AA4 | A477 | F723 | 10C |
| IF! $w$ | 5628 | 6610 | 953F | 1B79 |
| FF! $x$ | E432 | 45D1 | C130 | 8E3D |
| $\mathbb{F}!y$ | 7FC0 | A8D2 | 12A9 | 849F |
| Bases | $\left[\begin{array}{l} {\left[V, V^{2}\right],\left[1, W^{4}\right]} \\ {\left[1, X^{16}\right],\left[Y^{256}, Y\right]} \end{array}\right.$ | $\begin{aligned} & {\left[1, V^{2}\right],[1, W]} \\ & {\left[X, X^{16}\right],\left[Y^{256}, Y\right]} \end{aligned}$ | $\begin{aligned} & {\left[1, V^{2}\right],[1, W]} \\ & {\left[X, X^{16}\right],\left[1, Y^{256}\right]} \end{aligned}$ | $\begin{aligned} & {\left[V, V^{2}\right],\left[1, W^{4}\right]} \\ & {\left[1, X^{16}\right],\left[Y, Y^{256}\right]} \end{aligned}$ |
| $\mathbf{T}^{-1}$ | 5604FC5A41E67B644A40A8BEF62D | 481CC788160890C95C3826FA6AAC | 2EB8C0C54644A226B89D3EAFE542 | 1040FB41564B5837AA5A8B8CF735 |
| T | A03F998C29ECECA261AA8C68D89B | A923247E3ED95F525A598C2463DC | 0617A0B82ECAFE880051803463FD | 0ED2C47C0C07040CE435D4064289 |
| M | $9 F 5 A 116333100 C B 33 A 4604 F D 272 A$ | 66FEAC4F696AC9C4275E7DDCBA77 | A6EAF2D22BC542875BE444ECAF5A | 1BB5CC9E4C55E7F139D7E3584A5D |
| c | 1A2E | 8EA3 | 39F8 | F9D8 |
| Inverse | 367 | 376 | 390 | 367 |
| Total | 1209 | 1230 | 1231 | 1238 |

