1. Introduction
One of the greatest theorems of mathematics states that any natural number can be represented uniquely as a product of primes [1] [2]. Over years, prime numbers have been an important topic of number theory [3] [4]. Primes exhibit numerous interesting properties and are widely used in many fields in recent years, such as data science, cryptography , color theory [5] and reliability design .
Semiprime factorization methods are to decompose a semiprime number into its two prime factors, which serve as the key to cryptography including RSA public key encryption and RSA digital signature [6]-[12].
To the best of our knowledge, almost all available semiprime factorization methods are sieve-based, which can be classified into the following three categories [13] [14]:
Category 1: The Trivial Sieve Method [15] comes from the well-known Aristotle sieve technique of utilizing exhaustive brute force to factorize a semiprime number. The Wheel Sieve Method [16] [17] identifies potential prime numbers starting from a small wheel and expanding to larger ones which improves the Trivial Sieve Method.
Category 2: The Quadratic Sieve Method [18] includes the Pollard’s Rho [19] and Lenstra Elliptic Curve techniques [20]-[22].
Category 3: The General Sieve Method [8] [23] includes the General Number Field Sieve (GNFS) and Schnorr methods [24]-[26].
Each of the above-mentioned sieve-based methods may suffer from the following limitations:
i) They often employ heuristic techniques to factorize a semiprime number, guaranteeing no convergence to a feasible solution. For example, the GNFS method depends on the specification of two polynomial functions [13] [23]. Similarly, since Pollard’s Rho method [19] is a probabilistic algorithm, it may not reach a convergent solution.
ii) Some require a good deal of pre-known information in the factorization process. For example, the Trivial Sieve method needs to know all prime numbers smaller than
beforehand to factorize a semiprime number
, while the GNFS method likely uses more than half of its computing time to detect whether a number is smooth or not [13].
iii) Sieve methods often induce high space complexity. For example, to factorize a semiprime
, the Trivial Sieve method needs to reserve a large memory space to store
pre-known primes for computations. The space complexity gets higher as
grows larger.
iv) Often needed by some of the methods are the special structure for a given semiprime. For example, the GNFS method can only factorize a semiprime with limited smoothness values [13]. Similarly, Pollard’s Rho [19] and Elliptic Curve [22] methods are special purpose algorithms, whose running time depends on the size of the smaller prime factor of a given semiprime number.
v) Wheel Sieve is an improvement of the method of Sieve of Eratosthenes. Such methods are intended to construct a prime list using the first few primes. Pritchard’s method [16], proposed in 1982, is one of the early works that explored the concept of wheels for factorization. The idea has been elaborated by others such as [17] into various forms. The wheel-sieve-based algorithms in general suffer from the difficulties of clearly identifying or distinguishing primes and composites. Consequently, the solutions usually get fuzzy when the involved numbers become large.
Recently, Li, Fang and Kuo discovered the Periodic Table of Primes (PTP) which reveals all primes and composites with no factors of 2, 3, 5 and 7 exclusively in a compact table. One unique feature of the PTP is to identify the cyclic appearance of composites through the Cyclic Table of Composites (CTC) , and then differentiate primes and composites from a feasible set like that used in the wheel sieve, but subject to no preconditions. The PTP clearly shows that the ratio of the number of primes to that of the composites approaches zero when the table size grows. An early version of the SSRN archive [28] also provided examples to highlight the potential of using the PTP for factorization.
To overcome the general difficulties and limitations of sieve-based factoring methods, we now propose an innovative scheme for semiprime factorization. By extending our previous work of building the PTP [27], we introduce the concepts of “factor pairs” and “kernel factor pairs” for fast factorization. A short table of all factor pairs listed in 48 rows and 28 columns is constructed to steer the factorization of any given semiprime under no preconditions. With the Factor-Pairs Table (Table 1) in hand, there is no need to know, calculate, and store any primes that come before
for a given semiprime
. This may significantly reduce the space complexity issue faced by some sieve-based methods. The number of potential factor pairs involved in selecting a kernel factor pair corresponding to a given semiprime
naturally leads to the design of a highly parallelable algorithm for semiprime factorization.
![]()
![]()
![]()
![]()
The proposed method is well structured for semiprimes factorization when breaking RSA encryption and readily applicable for parallel computation.
The rest of the paper is organized as follows. In Section 2, we introduce the basic theory of semiprime factorization based on the concepts of factor pair and kernel factor pair. In Section 3, we provide a framework of designing a Linear-search Factor-pair Kernel (LFK) algorithm for semiprime factorization. The proposed LFK algorithm is presented in Section 4 while a complexity analysis is given in Section 5. An illustrative example is provided in Section 6. The paper ends with conclusions and discussions in Section 7.
The following notations are used throughout the paper:
: a semiprime number
: all non-negative integers
: all natural numbers
= {
and
has no factors of 2, 3, 5 and 7}
= {
and
is a prime}: all prime numbers
= {
is a product of two primes}: all semiprime numbers
= a table of 48 rows and 28 columns consisting of all factor-pairs
= ceiling function of the smallest integer greater than or equal to
= floor function of the largest integer smaller than or equal to
2. Basic Theory
For a given number
, it is relatively easy to identify if it has any factor of 2, 3, 5 and 7. Since
, we may group every 210 numbers in one period starting from 11, i.e.,
,
,
, … Checking the numbers in
, we can identify 48 numbers which have no factors of 2, 3, 5 and 7, i.e.,
. These 48 numbers are listed as the roots (root numbers) in the following set:
A quick observation shows that 5 roots in S are composite numbers, namely,
,
,
,
and
, and the rest 43 root numbers in S are primes.
An immediate result shows that any natural number
without factors of 2, 3, 5 and 7, no matter it is a prime or composite number, can be traced back to a unique root
.
Theorem 1.
A number
if and only if there exists a unique
and
such that
.
Proof.
i) If
for any
and
, then
because
is the residue of
divided by 2, 3, 5 and 7.
ii) If
, then we let
and
. Hence there exists a unique
such that
. ◻
Theorem 1 ensures that every number
without factors of 2, 3, 5 and 7 is an offspring of one unique root
in the
-th period of 210. Here we emphasize that any prime number greater than 10 is rooted back to a unique
. Moreover, we enlist all numbers without factors of 2, 3, 5 and 7 as below:
Period
,
.
Period
,
,
Period
,
.
We now focus on semiprime numbers. Since any semiprime
is a product of two primes and it is easy to identify if
has any factor of 2, 3, 5 or 7, without loss of generality, we may limit our consideration to
. In this case, from Theorem 1, there exists a unique
,
such that
. (1)
Moreover, there must exist
,
and
,
such that
. (2)
Note that
and/or
are allowed.
Equation (1) and Equation (2) require that
Therefore,
has to be a multiple of 210. We call a factor pair with respect to
and denote the pair by where
and
. When
, we call a co-factor pair with respect to
.
Obtained through some elaborative calculations, the Factor-Pairs Table
(Table 1) shows all factor/co-factor pairs with respect to each
in the arrangement of
. For each
,
, we define .
From
, we see that (i) for
and 48,
with 4 co-factor pairs, (ii) for the rest 42
’s,
with no co-factor pairs, and (iii)
for all
.
Summarizing the above, we reach the next result.
Theorem 2.
For any given semiprime
with no factors of 2, 3, 5 and 7, i.e.,
, there exists a unique
with
and a factor pair with
,
such that
. (3)
It is important to note that since the factorization of a semiprime
is unique, the corresponding factor pair of
, i.e., , is unique. We call it the kernel factor pair associated with the semiprime
.
3. Algorithm Design
Based on Theorem 2, we introduce the overall design of an algorithm for semiprime factorization based on the kernel factor pair.
For a given semiprime
, it is easy to check if
has a factor of 2, 3, 5 or 7. Without loss of generality, we may assume that
. The root number
can be readily identified by computing
(4)
with
Once
is determined for
, then we check each factor pair to see if Equation (3) is satisfied by certain
,
. If the answer is Yes, then is a kernel factor pair for factoring
, and the existence and uniqueness of a kernel factor pair in
for
is assured by Theorem 2.
For a given factor pair , to check if it is a kernel factor pair for
is equivalent to finding
and
such that
or equivalently,
Denoting
(5)
we have
(6)
Consequently, we may perform a less desirable “quadratic search” on simultaneously to see if is a kernel factor pair for
.
4. LFK—A Linear Search Algorithm
To reduce the complexity involved in the quadratic search, we conduct further analysis to reach a “linear search” algorithm. The algorithm presented here is closely related to the theoretical framework developed in [29], which involves an innovative algorithm for general primality testing. In this paper, matching the concept of “factor pairs” with the characteristics of “semiprime”, we focus on developing an elaborated algorithm specifically for semiprime factorization.
Without loss of generality, we assume that
in Equation (6) which leads to
Together with Equation (5), we have
Hence
. This means that it is sufficient to check for .
Once
is selected, Equation (6) indicates that
(7)
should be an integer greater than or equal to
.
Our approach leads to the following LFK method for semiprime factorization.
Step1: Input a semiprime number
. If
can be fully divided by 2, 3, 5 or 7, then stop with a simple factorization.
Step2: Determine the root number
.
Step3: Determine the kernel factor pair in
.
Pick one unchecked factor pair a time in
from
, say .
Set
.
Step4: Check for possible
.
For , compute
using Equation (7).
If
is an integer and
, then output as the kernel factor pair and
. Stop the algorithm.
Step5: Check for possible
.
For
, compute
Step6: Mark the current factor pair as “checked” and Return to Step 3 for the next unchecked factor pair in
.
As a direct consequence of Theorem 2, we obtain the next result.
Corollary 3.
The proposed LFK algorithm always terminates in finite steps with a unique kernel factor pair for any given semiprime
.
5. Complexity Analysis
For any given semiprime number
, we know
for a unique pair of
and
. Except that
or
is 2, 3, 5 or 7, which can be easily verified,
has no factors of 2, 3, 5 and 7, i.e.,
. Theorem 1 assures that
has a unique root number
. Table
shows that there are either 24 or 28 factor pairs in
for a semiprime rooted at
.
Theorem 2 further confirms the existence of a unique kernel factor pair such that
can be factorized as the product of
and
.
Steps 1 - 3 of the LFK Algorithm set up the platform with simple calculations. Considering the possible relations between
and
, the desired kernel factor pair can be identified by searching through
integer values for
first and then, when needed, searching through
integer values for
. Therefore, the LFK algorithm involves at most
linear searches over
integer values. In other words, a rough estimation of the required work for factoring a semiprime
involves no more than searching through
integer values. The previous analysis leads to the next result.
Theorem 4.
The LFK algorithm requires at most 56 linear searches over the integer values in
to find a unique kernel factor pair for any given semiprime
.
It is worthy to point out that the tasks for searching the kernel factor pair out of the 28 (or 24) factor pairs in
can be conducted independently in parallel.
6. Illustrating Example
We illustrate the LFK algorithm using the following example to factor an 11-digit semiprime
.
Step1: Since
cannot be fully divided by 2, 3, 5 and 7, we know
.
Step2: Since
,
is the root number of
.
Step3: We sequentially check the 24 factor-pairs in
from
, namely,
. After taking 12 iterations going through Steps 4 and 5 to check for
without finding any feasible solution, we check . Calculate
.
Step4: Check for possible
.
Compute
for each . Since none of the
’s are integers, continue to Step 5.
Step5: Check for possible
.
Compute
for each
. When
,
is an integer. Output (71, 199) as the kernel factor pair and
.
7. Conclusions and Discussions
In this paper, we introduce the concepts of factor pairs and kernel factor pairs to form a new framework for designing an efficient semiprime factoring algorithm that serves as the key input to the RSA algorithm for computer encryption. Unlike commonly developed sieve-based methods, the proposed kernel-factor-pair-based LFK algorithm is proven to successfully factorize any given semiprime
by taking at most 56 linear searches over the integer values in [0,
]. The overall computational complexity is less than searching through
integer values with simple calculations. The LFK algorithm works under no preconditions and finds the exact factors for any given semiprime. The required search can be conducted naturally in parallel for fast computations.
As pointed out in [27], instead of eliminating the factors of 2, 3, 5 and 7 for consideration, an “extended” PTP table can be easily built by eliminating fewer factors, such as 2, 3, 5 only, or by eliminating more factors, such as 2, 3, 5, 7 and 11. Since the construction of the PTP is not sieve-based, it requires no prior knowledge or processing efforts of any other primes. In fact, an extended PTP that eliminates more factors for consideration will have a longer period but with more roots. The concepts of factor pairs and kernel factor pair follow accordingly for the LFK algorithm to prevail.
Some further discussions are as follows:
1) Notice that the LFK searches
(and
) through the integer values from 0 to
. When
becomes a huge number in
digits, the searching range could be a concern. In this case, by adopting a prime-logarithmic technique in [4], we can simply find an integer
such that
and introduce
(
) 0 - 1 binary variables
such that
can be represented by
for
.
In other words, the task of searching
through the integer values from 0 to
can be replaced by checking the values of
(about
) binary variables. For instance, to factorize a semiprime
, we only need 16 binary variables to find the solution being
However, the performance of the prime-logarithmic technique developed by Li et al. [4] strongly depends on, firstly, the way of formulating the related linear binary programming problem, and secondly, the software package used to solve the linear binary programming problem. This will remain for further study.
2) Compared with the commonly used sieve-based semiprime factoring methods, we observe that:
(1) Heuristic versus deterministic: Many sieve-based methods are heuristic and may not converge to a feasible solution. However, the proposed LFK algorithm is a deterministic approach which guarantees to factorize any semiprime number into two prime factors.
(2) High space complexity versus low space complexity: The Trial Sieve Method requires a high space complexity to store all primes that come before
, while, with the
in hand, the LFK algorithm needs at most 28 memory spaces to store the corresponding factor pairs.
(3) Hard-to-manage versus manageable parallel computing: Sieve-based methods may generate numerous subproblems which are hard to arrange for parallel processing. The LFK algorithm can easily arrange all subproblems into 24 or 28 processes to be computed parallelly.
In conclusion, fast semiprime factorization is essential for breaking RSA encryption. The kernel-factor-based LFK algorithm provides a new angle for further investigation on designing efficient factoring methods for information security.
Acknowledgements
We thank Professor Xiaohua Jia of City University of Hong Kong and an anonymous reviewer for reviewing and providing valuable comments to the article. We are grateful for the in-kind-support from NTU and the financial support for Way Kuo from the Hong Kong Institute for Advanced Study (CityU 9610556).