# Permute 3.4.3

When cufftXtMemcpy() is used to copy data from GPU memory back to host memory, the results are in natural order regardless of whether the data on the GPUs is in natural order or permuted. Using CUFFT_COPY_DEVICE_TO_DEVICE allows users to copy data from the permuted data format produced after a single transform to the natural order on GPUs.

## Permute 3.4.3

For single 2D or 3D transforms on multiple GPUs, when cufftXtMemcpy() distributes the data to the GPUs, the array is divided on the X axis. E.G. for two GPUs half of the X dimenson points, for all Y (and Z) values, are copied to each of the GPUs. When the transform is computed, the data are permuted such that they are divided on the Y axis. I.E. half of the Y dimension points, for all X (and Z) values are on each of the GPUs.

2D and 3D multi-GPU transforms support execution of a transform given permuted order results as input. After execution in this case, the output will be in natural order. It is also possible to use cufftXtMemcpy() with CUFFT_COPY_DEVICE_TO_DEVICE to return 2D or 3D data to natural order.

In practice, what a permutation test does is to take your observed data and then shuffle (or permute) part of it. After each shuffle, some aspect of the data is recalculated. That could be for instance the correlation coefficient, or it could be a difference in means between two groups. The data then get randomly reshuffled again, and the test-statistic is recalculated again. This goes on for thousands of times - for as many shuffles are deemed acceptable. This is usually a minimum of 1,000 but typically at least 10,000 shuffles are done. After all the permutations (shuffles) are performed, a distribution of the statistic of interest is generated from the permutations. This is comapred to the original observed statistics (e.g. correlation coefficient, difference in group means) to see if the observed value is unusually large compared to the permuted data.

We state our main result Theorem 2.1 in Sect. 2; in particular this section also contains in Sect. 2.3 the definition of the sequences used to characterise the output of the greedy algorithm of Steinerberger as well as an outline of our proof strategy. In Sect. 3, we review properties of the van der Corput sequence. We also recall important generalisations and the method of Faure to calculate the discrepancy of permuted van der Corput sequences. In Sect. 4, we study the discrepancy of the particular generalised van der Corput sequences used to characterise the output of the algorithm. As a main result, we show in Theorem 4.5 resp. in Corollary 1 that all such sequences have the same discrepancy. In Sect. 5, we relate our results to the greedy algorithm. We show in Theorem 5.5 that the classical van der Corput sequence is an admissible output of the algorithm and we finally prove Theorem 2.1.

Faure [11] generalised the definition of the classical van der Corput sequence in two ways. First, he replaced the binary representation of an integer by its general b-adic representation for a fixed integer base \(b\ge 2\). This allows for the definition of the b-adic radical inverse function \(S_b\), which in turn can be used to define van der Corput sequences in general bases; i.e. \(\mathcal S_b=(S_b(n))_n\ge 0\). Furthermore, for \(\sigma \in \mathfrak S_b\) Faure defines the generalised (or permuted) van der Corput sequence \(\mathcal S_b^\sigma =(S_b^\sigma (n))_n \ge 0\) for a fixed base \(b\ge 0\) via the permuted b-adic radical inverse function

We prove Theorem 2.1 in 2 steps. In the first step (i.e. in Sect. 4), we calculate the discrepancy of permuted van der Corput sequences \(\mathcal S_2^m^\sigma \), \(\sigma \in \mathcal P_m\). We conclude in Corollary 1 that all such sequences have the same extreme discrepancy as the classical van der Corput sequence \(\mathcal S_2\). Our argument uses the general machinery of Faure for the calculation of the discrepancy of permuted van der Corput sequences which we recall in Sect. 3 and is based on various symmetries exhibited by the permutations in \(\mathcal P_m\).

In the second step (i.e. in Sect. 5), we relate our particular family of permuted van der Corput sequences to the greedy algorithm of Steinerberger. We show that every such sequence can be obtained from the algorithm (Lemma 5.8) and that every output of the algorithm can be described by such a sequence (Lemma 5.7). Thus, we obtain a full characterisation of the possible outputs of the greedy algorithm and by the results of the first step we also know the discrepancy of any such output.

Faure defined an operation [11, Section 3.4.3] which takes two arbitrary permutations \(\sigma , \tau\) in bases b and c and outputs a new permutation, \(\sigma \cdot \tau\) in base \(b\cdot c\). The motivation for this definition comes from the following property which was first noted in [11, Proposition 3.4.3].

In this section, we study the discrepancy of sequences generated from permutations in \(\mathcal P_m \subset \mathfrak S_2^m\). The main result of this section is the observation that all the functions \(\psi _2^m^\sigma \) are identical for \(\sigma \in \mathcal P_m\). Thus, using Lemma 3.6, we see that all permutations in \(\mathcal P_m\) generate permuted van der Corput sequences with the same asymptotic discrepancy constant (in fact, with the same discrepancy) as the classical van der Corput sequence. We summarise the main observations of this section in Corollary 1.

We prove the theorem by induction on m. The theorem is trivially true for \(m=1\). Now let \(m=2\). It is easy to see that both permutations give rise to the same \(\psi\)-function. To turn to the induction step, we assume that the assertion is true up to m. In particular this means that all permutations in the set \(\mathcal P_m\) generate permuted van der Corput sequences with identical \(\psi\)-function.

Abstract:Substitution boxes (S-box) with strong and secure cryptographic properties are widely used for providing the key property of nonlinearity in block ciphers. This is critical to be resistant to a standard attack including linear and differential cryptanalysis. The ability to create a cryptographically strong S-box depends on its construction technique. This work aims to design and develop a cryptographically strong 8 8 S-box for block ciphers. In this work, the construction of the S-box is based on the linear fractional transformation and permutation function. Three steps involved in producing the S-box. In step one, an irreducible polynomial of degree eight is chosen, and all roots of the primitive irreducible polynomial are calculated. In step two, algebraic properties of linear fractional transformation are applied in Galois Field GF (28). Finally, the produced matrix is permuted to add randomness to the S-box. The strength of the S-box is measured by calculating its potency to create confusion. To analyze the security properties of the S-box, some well-known and commonly used algebraic attacks are used. The proposed S-box is analyzed by nonlinearity test, algebraic degree, differential uniformity, and strict avalanche criterion which are the avalanche effect test, completeness test, and strong S-box test. S-box analysis is done before and after the application of the permutation function and the analysis result shows that the S-box with permutation function has reached the optimal properties as a secure S-box.Keywords: cryptography; substitution box; block cipher