Sequential logic and pipelining in chip-based electronic-photonic digital computing

Zhoufeng Ying, Chenghao Feng, Zheng Zhao, Jiaqi Gu, Richard Soref, Life Fellow, IEEE, David Z. Pan, Fellow, IEEE, and Ray T. Chen, Fellow, IEEE

1 Microelectronics Research Center, The University of Texas at Austin, Austin, Texas 78758, USA
2 Computer Engineering Research Center, The University of Texas at Austin, Austin, Texas 78705, USA
3 Department of Engineering, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
* chenrt@utias.utoronto.ca

Abstract: The recent rapid progress in integrated photonics has catalyzed the development of integrated optical computing in this post-Moore's law era. Electronic-photonic digital computing, as a new paradigm to achieve high-speed and power-efficient computation, has begun to attract attention. In this paper, we systematically investigate the optical sequential logic and pipelining in electronic-photonic computing, which together offer a solution to potential problems in latency and power budget as the size of electronic-photonic computing circuits scales up considerably to achieve much more complex functions. Pipelining and sequential logic open up the possibility of high-speed very-large-scale electronic-photonic digital computing.

Index Terms: Optical computing, logic circuits, pipeline processing, optical logic devices, electro-optic devices.

1. Introduction

Integrated photonics has been evolving and maturing rapidly in this decade, which not only revolutionizes the optical interconnects industry[1]–[8], but also provides a solid foundation for the exploration of optical computing in this post-Moore's law era[9]–[11]. The emergence of various components in integrated photonics with satisfying performances, such as high bandwidth and low power consumption, has made the electronic-photonic digital computing (EPDC) become one of the most promising candidates to improve the computation capability of chips further. Wide investigations of EPDC have been carried out these years, ranging from fundamental logic gates[12]–[15] to functional circuits[11], [16]–[19] and logic synthesis algorithms[20]–[22]. However, the specifications and requirements of very-large-scale EPDC circuits have not been discussed yet.

Several concerns will emerge as the circuit size of EPDC scales up significantly. First, the latency accumulates along the propagation path, which at some point may wipe out the advantage of EPDC in speed over the electronic counterpart as the size of computing circuits becomes much larger. Second, the circuit will also suffer from propagation loss as the light goes through more components. Unlike the electronic transistors whose signals can be normalized to the supply voltage or ground easily, there are no efficient and compact integrated optical amplifiers suitable for computing available up to now. These factors will finally limit the size of the circuits and thus narrow its applications.

In this paper, we bring in the sequential logic and pipelining that have been widely used and developed in electronic very-large-scale integration (VLSI) into EPDC in order to resolve the problems of latency and loss. Optical sequential logic (OSL) and amplified sequential logic (ASL) are proposed and discussed. The comparison of combinational logic and sequential logic is then conducted, followed by a thorough calculation of the latency and loss improvement with the assist of optical memory units and optical amplifiers as well as the pipelining.

2. Optical combinational logic

EPDC uses sophisticated electronics to control the circuits while allowing photons to go through their waveguided circuits at the speed of light to process the information[10], [11]. Up to now, most of the demonstrations on the EPDC, including logic gates and functional circuits[11], [12], [14], [16], [23]–[31], can be categorized as optical combinational logic (OCL). Figure 1 shows a typical on-chip EPDC circuit that can cover most of these proposed structures. Electrical inputs are fed into these electrical-optical (EO) modulators first to set the circuit into a stable state after a period of time \( t_{ui} \). Light beams with a single wavelength or multiple wavelengths (or even with different polarization or modes) start to propagate through the circuit, which take a time \( t_{dc} \) to arrive at the other end before being received by photodetectors (sometimes photodetectors are not required). The latency of the photodetectors is marked as \( t_{pd} \). The longest path of the combinational circuit that determines \( t_{dc} \) is called the critical path. Normally, there are three ways to present combinational logic, namely circuit diagrams, truth tables, and Boolean expressions, which are widely used in the researches of EPDCs[32], [33].
The following are the characteristics of the optical combinational logic. First, the output signals of the optical combinational logic are only a function of its input signals. In other words, \( \text{output} = f(\text{electrical inputs}) \). Second, output signals are valid after a certain period once the input signals become valid. Third, the circuit does not contain any kind of information-storing elements. Fourth, no kind of feedback loops exists in the circuits. The last two characteristics mean that optical combinational logic is also a time-independent logic. The circuit depicted in Figure 1 shows these characteristics of a combinational circuit perfectly. No information-storing devices or feedback loops exist in the circuit. During each clock cycle, all the electrical inputs will be applied simultaneously into the circuit and the output will be valid after a short period of time. The requirement for this delay, in other words the clock period \( t_c \), can be written as

\[
t_c \geq t_{si} + t_{dc} + t_{pd},
\]

where \( t_{si} \) is the set-up time of the combinational circuit, \( t_{dc} \) is the propagation latency of a combinational circuit, and \( t_{pd} \) is the set-up time of the photodetector array. Fortunately, the state-of-the-art modulators[34]–[36] and photodetectors[37],[38] are capable of providing ultrasmall \( t_{si} \) and \( t_{pd} \), for example 10 ps or even less. It also costs only sub-picoseconds for light to go through one gate in EPDC. Therefore, after adding all of them up, as long as the circuit size stays small, it is still possible to achieve an ultrahigh computing clock rate in the tens-of-GHz range, a rate faster than the clock rate of commercial central processing units (CPUs) and graphic processing units (GPUs). However, as the photonic circuit size scales up with more components and certainly longer critical path, \( t_{dc} \) may dominate the entire latency and limit the computing speed significantly, wiping out an essential advantage of the EPDC. Another limitation that the EPDC has to face during scaling is the optical propagation loss, which may result in an exponential increase in the optical input power requirement and which may cause difficulties in detection. In addition, there are not yet any suitable integrated amplifiers for computing with high bandwidth, low power consumption, high efficiency, and small footprint in integrated photonics. The reason that the size requirement of amplifiers is mentioned here is that it will induce additional propagation latency as well.

In summary, optical combinational logic, as the most straightforward form of the EPDC, has an intrinsic limitation in circuit size due to latency and loss. Switching from combinational logic to sequential logic is a natural step, as we can tell from the successful experience of the development of VLSI.

### 3. Optical sequential logic

Unlike optical combinational logic circuits that only depend on currently present inputs, optical sequential logic (OSL) circuits have some built-in information-storing elements. It means that OSL circuits are able to take into account the current inputs as well as the previous input sequences or the previous outputs that have been stored. Here are some characteristics of optical sequential logic. First, optical sequential circuits are constructed with combinational circuits and optical memory units, as shown in Figure 2(a), where the new memory unit is shown in Figure 2(c). Second, the circuit output signals are a function of its input signals and of its stored signals. Third, the memory units in an optical sequential logic circuit are normally triggered and can store devices or feedback loops exist in the circuit. During each clock cycle, all the electrical inputs will be applied simultaneously into the circuit and the output will be valid after a short period of time. The requirement for this delay, in other words the clock period \( t_c \), can be written as

\[
t_c \geq t_{si} + t_{dc} + t_{pd},
\]

where \( t_{si} \) is the set-up time of the combinational circuit, \( t_{dc} \) is the propagation latency of a combinational circuit, and \( t_{pd} \) is the set-up time of the photodetector array. Fortunately, the state-of-the-art modulators[34]–[36] and photodetectors[37],[38] are capable of providing ultrasmall \( t_{si} \) and \( t_{pd} \), for example 10 ps or even less. It also costs only sub-picoseconds for light to go through one gate in EPDC. Therefore, after adding all of them up, as long as the circuit size stays small, it is still possible to achieve an ultrahigh computing clock rate in the tens-of-GHz range, a rate faster than the clock rate of commercial central processing units (CPUs) and graphic processing units (GPUs). However, as the photonic circuit size scales up with more components and certainly longer critical path, \( t_{dc} \) may dominate the entire latency and limit the computing speed significantly, wiping out an essential advantage of the EPDC. Another limitation that the EPDC has to face during scaling is the optical propagation loss, which may result in an exponential increase in the optical input power requirement and which may cause difficulties in detection. In addition, there are not yet any suitable integrated amplifiers for computing with high bandwidth, low power consumption, high efficiency, and small footprint in integrated photonics. The reason that the size requirement of amplifiers is mentioned here is that it will induce additional propagation latency as well.

In summary, optical combinational logic, as the most straightforward form of the EPDC, has an intrinsic limitation in circuit size due to latency and loss. Switching from combinational logic to sequential logic is a natural step, as we can tell from the successful experience of the development of VLSI.
periodically because each memory employs a sufficiently strong CW optical input to its EO modulator. Figure 2(d) shows the timing diagram. Within each clock cycle, all electrical signals will be fed into the circuit through EO modulators and this process takes a time $t_{si}$ to get stabilized. Meanwhile, memory units will be trigged by the clock signal as well to send out the stored signals within the period of time $t_{qr}$. After all of the circuits are set successfully by the electrical signals, as well as the input optical signals are ready, then the light beams will propagate through the combinational circuit, which takes a time $t_d$ before they arrive at the next memory unit, there to be stored within $t_{sr}$.

Therefore, the timing requirement for this sequential circuit with pipelining will be

$$t_c \geq \max\{t_{si}, t_{qr}\} + t_d + t_{sr},$$

where $t_d$ is the maximum delay time of all the combinational circuits, $t_{qr}$ is the clock-to-output delay time of the memory unit, $t_{sr}$ is the set-up time of the memory unit, and $t_{si}$ is the set-up time of the combinational circuit.

Figure 2. (a) Diagram of sequential logic, consisting of optical combinational logic and optical memory units. (b) A typical optical sequential circuit with multiple combinational logic circuits connected with register arrays. (c) One possible realization of the optical memory unit, one line within the memory bank: PD: photodetector; EOM: electro-optic modulator. (d) Timing diagram of optical sequential logic.

The memory units that store and retrieve optical signals at the same pace with the clock serve as critical building blocks in the OSL. Since integrated all-optical memory devices for this purpose are not off-the-shelf yet, mature electrical memory units, especially the electrical registers, are adopted here in cooperation with photodetectors and electro-optical modulators, as shown in Figure 2(b). Fortunately, nowadays, the CMOS-compatible fabrication technology for integrated photonics has reached a high state of development that has enabled the electronic-photon hybridization in the same chip with few constraints[39], [40]. In other words, electronics transistors can be fabricated anywhere in the same chip with photonic components such as adjacent to photodetectors and modulators. Advanced packaging technique[41] to integrate electronic chips with photonic chips will be a promising method to achieve the hybridization of the components mentioned above. The fabrication of these integrated photonic components including PDs and EOMs are already available in most of the integrated photonics foundries with acceptable cost and stable performance[42]. While the hybridization of photonic and electronic components is still under development in industry and we believe the cost will be low enough after they are widely adopted in many applications.

It is worth mentioning that there are many optical sequential logic gates and circuits reported in the all-optical logic area, such as optical binary counter[43], [44], circulating shift register[45], and pattern recognition circuit [46]. As an important paradigm of optical computing, all-optical logic normally uses semiconductor optical amplifier (SOAs) as active elements to switch the optical output by light, which differs from how EPDC does. However, it should be accepted that the combination of these two may bring in more attractive functions in the future.

4. Amplified sequential logic

One advantage of synchronizing the entire OSL circuit using a universal clock is that the delay of each small OCL circuit does not have to be the same, which eases the effort of adjusting precisely the size and latency of each circuit.
In electronic domain, due to the tedious delay adjustment in VLSI as well as the easy accessibility of mature electronic registers, the universal clock for sequential logic is adopted naturally. However, in optical domain, the latency actually could be controlled more easily and precisely using waveguides/delay lines. For instance, 1 μm long waveguide will induce only ~10 fs delay. By using this approach, the amplified sequential logic (ASL) is proposed and discussed here.

Figure 3 (a) depicts the diagram of the ASL. The delay of each combinational logic circuit is fine-tuned using delay lines to be the same (\( t_d = \max (t_{d1}, t_{d2}, t_{d3}, \ldots) \)) using delay lines. Therefore, it is guaranteed that the internal outputs will arrive at the amplifiers simultaneously, and then be boosted and sent out at the same time if the amplifiers are identical. Thus, the electrical inputs can be injected at the right timing. This uniform pace produces the same effects as a universal clock does.

The optical amplifiers inserted in the circuits continue to boost the signals to tackle with the propagation loss. Although the all-optical amplifier in integrated photonics for computing has yet to be developed, the progress in optical-to-electrical-to-optical (OEO) devices that are compact, high-speed, and power-efficient makes the proposed ASL more achievable[47], [48] since the OEO repeater provides optical gain. Assuming the delay caused by the amplifier is \( t_a \). Then the time requirement for the ASL, as shown in Figure 3(b), can be written as

\[
s_t \geq \max\{t_e, t_o\} + t_d .
\]

(3)

Compared to the OSL, one disadvantage of ASL is that the clock rate is not flexible since the delay for each small combinational logic circuit is fixed. On the contrary, the OSL can run at a lower clock rate for some power-saving purposes in some applications due to the memory unit that can hold signals much longer.

5. Pipelining

The precise and uniform pace for each combinational logic in OSL and ASL makes them well-prepared for pipelining. Pipelining is an essential and commonly-used concept in sequential logic to improve computing throughput. Partitioning of combinational circuits into N parts and operation as a sequential circuit is called pipelining with stages of N.

An example of pipelining with three stages is shown in Figure 4. A large OCL is split into three smaller OCLs, marked as \( f_1 \), \( f_2 \) and \( f_3 \). The input/job sequences are \( x_1, x_2, x_3 \) and so on. After going through three circuits, those jobs will generate the internal outputs \( u_1, u_2, u_3 \ldots \) and \( v_1, v_2, v_3 \ldots \) as well as external outputs \( y_1, y_2, y_3 \ldots \). At the first clock cycle, \( x_1 \) will be fed into the circuit first and processed by circuit \( f_1 \) and generates \( u_1 \). At the second clock cycle, \( u_1 \) will move forward to the second circuit \( f_2 \) to generate \( v_1 \) while at the same time \( x_2 \) can be fed into the first circuit to obtain \( u_2 \). At the third clock cycle, \( v_1 \) will move to the final block \( f_3 \) to generate the output \( y_1 \), and \( u_2 \) to the second block \( f_2 \), and at the same time \( x_3 \) will be fed into \( f_1 \). Since then, with the cycles going on, we will obtain an external output at each cycle. Unlike using a large combinational circuit where input sequence is fed only after previous output becomes available with two-thirds of the circuits idle, OSL and ASL can feed the input more frequently and keep all functional circuits performing calculation at its full speed without any idle time slots. We can draw a preliminary conclusion readily that N-fold pipelining improves the throughput by nearly a factor of N if the latency of the memory units or amplifiers is negligible as compared to the latency of each combinational circuit.
Now we will conduct a theoretical calculation of the latency, throughput, and propagation loss of the pipelined OSL and ASL as compared to the OCL without pipelining. Due to the similarity between the analysis of OSL and ASL in terms of pipelining, we will take ASL as an example.

**Latency.** The latency means the time required for a particular signal to travel from the beginning to the end. The total latency of OCL and ASL are

\[ L_C = t_{si} + t_{dc} + t_{pd}, \]

\[ L_S = \max\{t_{s_i}, t_{s_d}\} \times N + t_d \times N + t_{pd}, \]

respectively, where \( t_d = \max\{t_{d_i}\}, i = 1, 2, \ldots, N \).

**Throughput.** The throughput means the number of jobs it can process in a certain time period. The maximal total throughput of the OCL and ASL are

\[ TP_C = [t_{si} + t_{dc} + t_{pd}]^{-1}, \]

\[ TP_S = [\max\{t_{s_i} + t_{s_d}\}]^{-1}, \]

respectively. Assume \( t_{di} = t_d = \frac{t_{si}}{N} \), which means each small combinational logic circuit in the ASL has equal delay and no extra delay lines are required. Assume \( t_d = 2t_{si} = 2t_{pd} \). Now the latency ratio (LR) and throughput ratio (TPR) can be calculated as

\[ LR = \frac{L_S}{L_C} = \frac{(N + 0.5) \times t_{si} + N \times t_{s_d}}{t_{si} \times N \times t_d} = \frac{(N + 0.5) \times k + N}{k + N}, \]

\[ TPR = \frac{TP_S}{TP_C} = \frac{N \times t_{si} + t_{s_d}}{t_{si} + t_{s_d}} = \frac{N + k}{1 + k}, \]

where \( k = \frac{t_{si}}{t_d} \), called the overhead factor (OF). It describes the additional latency of the memory unit or amplifier over the latency of the small combinational circuit.

**Loss.** For simplicity, we take the critical path, which has the longest latency, in order to do the following calculation. Assume each logic bit (one or more gates per bit) on average will have latency of \( t_{si} \), and will encounter loss of \( \alpha \) (in dB units). The memory unit or the amplifier is able to provide gain of \( G \) (in dB units). Therefore, if there are \( M \) bits in this critical path in each combinational logic circuit, then we have

\[ G \geq M \alpha, \]

\[ t_d = Mt_{si}. \]

Then we obtain a constraint for the overhead factor

\[ k = \frac{t_{si}}{t_d} \geq \frac{\alpha t_{si}}{G t_{si}}. \]

Given the four parameters, the delay and loss of each bit along with the delay and gain of the memory unit or amplifier, we are easily able to know the overhead factor and thus the latency ratio and throughput ratio. Table 1 is a summary of all the symbols discussed.
Table 1
Symbols and definitions.

<table>
<thead>
<tr>
<th>Symbols</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>$t_{dt}$</td>
<td>The propagation latency of a combinational circuit</td>
</tr>
<tr>
<td>$t_{d}'$</td>
<td>$i = 1, 2, \cdots, N$. The propagation latency of each small combinational circuit.</td>
</tr>
<tr>
<td>$t_d$</td>
<td>$t_d = \max(t_{d}'_i), i = 1, 2, \cdots, N$</td>
</tr>
<tr>
<td>$t_{si}$</td>
<td>The setup time of a combination circuit</td>
</tr>
<tr>
<td>$t_{pd}$</td>
<td>The setup time of a photodetector</td>
</tr>
<tr>
<td>$t_{sr}$</td>
<td>The setup time of a memory unit</td>
</tr>
<tr>
<td>$t_{qr}$</td>
<td>The clock-to-output delay of a memory unit</td>
</tr>
<tr>
<td>$t_{re}$</td>
<td>The setup time of an electrical register</td>
</tr>
<tr>
<td>$t_{ro}$</td>
<td>The clock-to-output delay of an electrical register</td>
</tr>
<tr>
<td>$t_a$</td>
<td>The total delay of an amplifier</td>
</tr>
<tr>
<td>$L_c$</td>
<td>Total latency of a combinational circuit</td>
</tr>
<tr>
<td>$L_s$</td>
<td>Total latency of a sequential circuit</td>
</tr>
<tr>
<td>$LR$</td>
<td>Latency ratio of the entire circuit w/ and w/o pipeline</td>
</tr>
<tr>
<td>$TP_c$</td>
<td>Total throughput of a combinational logic circuit</td>
</tr>
<tr>
<td>$TP_s$</td>
<td>Total throughput of a sequential logic circuit or amplified sequential logic circuit</td>
</tr>
<tr>
<td>$TPR$</td>
<td>Throughput ratio of the entire circuit w/ and w/o pipeline</td>
</tr>
<tr>
<td>$k$</td>
<td>Overhead factor; $k = t_a / t_d$</td>
</tr>
<tr>
<td>$N$</td>
<td>Pipeline stages of a sequential circuit</td>
</tr>
<tr>
<td>$M$</td>
<td>Number of bits in critical path in each combinational circuit</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>Loss of each bit, dB unit</td>
</tr>
<tr>
<td>$t_g$</td>
<td>Latency of each bit</td>
</tr>
<tr>
<td>$G$</td>
<td>Gain of the memory unit or amplifier, dB unit</td>
</tr>
</tbody>
</table>

6. Result

Figure 5 shows the latency ratio (LR) and throughput ratio (TPR) versus the number of pipeline stages. It is worth mentioning that in the electronics domain, the pipeline stage depth ranges from a few to 30 or so, and sometimes over a thousand in some special applications[49]. In the Fig. 5, different curves represent different overhead factors $k$. It shows that the smaller the overhead factor is, the better the throughput improvement will we achieve at the cost of a little latency increase. For example, with the overhead factor of 0.5, then 20.3 times improvement of throughput will be seen while the latency ratio is only less than 1.5.

Some assumptions are then made to show a vivid comparison between the logic circuits with pipelining and those without pipelining. Table 2 shows four cases where the pipeline stages are either 10 or 30, and the amplifiers are either based on foundries’ semiconductor optical amplifier (SOA) [50] or upon Soref’s OEO[47]. Take the case of $N=30$ and the ideal one as an example. The logic circuit is assumed to have 0.5 dB loss per bit and 0.5 ps latency per bit on average. The amplifier is able to provide 20 dB gain and introduces only <20 ps into the system[50]. Then we know that the maximum count of the bits in a critical path for each combinational circuit is 40 and the overhead factor is 1. Then we can obtain the data of TPR and LR from Figure 5, which turn out to be 15.5 and 1.91, respectively at $k=1$. It means that using pipelining will provide 15.5 times improvement in the throughput and only has less than 2 times latency increase. The circuit is able to run at a frequency as high as 25 GHz and the loss is 20 dB at the photodetector because amplifiers will boost the signals at each stage except the last one. On the contrary, the circuit without pipelining will suffer from 580 dB loss in total and can only run at 1.61 GHz. Assume the combinational circuit has 64-bit input and output ports in a processor with 64-bit data and address bus. Then the total count of bits is estimated to be 76800. Note that, thanks to the rapid development of integrated photonics, the components such as PDs and EOMs are able to provide an OE/EO bandwidth larger than 40GHz[42], [51], [52]. Compared to the speed calculated in Table 2, the components bandwidth will not be the bottleneck. This approach provides a promising solution for EPDC to cascade tens of thousands of logic gates while still running at a very high clock rate.
Figure 5. The latency ratio (LR) and throughput ratio (TPR) with respect to the pipelining stage depth N for different overhead factors.

Table 2
Examples of the specifications based on different assumptions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Symbols</th>
<th>Unit</th>
<th>Specs with amplifiers</th>
<th>Specs with OEO</th>
</tr>
</thead>
<tbody>
<tr>
<td>Logic circuit characteristics</td>
<td>Loss per bit</td>
<td>dB</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td></td>
<td>Latency per bit</td>
<td>ps</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td>Register characteristics</td>
<td>Gain</td>
<td>dB</td>
<td>20</td>
<td>~10</td>
</tr>
<tr>
<td></td>
<td>Latency</td>
<td>ps</td>
<td>20</td>
<td>~100</td>
</tr>
<tr>
<td>Result with pipeline</td>
<td>Overhead factor</td>
<td>k</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td></td>
<td># of bits</td>
<td>M</td>
<td>40</td>
<td>20</td>
</tr>
<tr>
<td></td>
<td>Pipeline stage</td>
<td>N</td>
<td>N=10</td>
<td>N=30</td>
</tr>
<tr>
<td></td>
<td># of total bits</td>
<td>M x N x 64</td>
<td>25600</td>
<td>76800</td>
</tr>
<tr>
<td></td>
<td>Throughput improvement</td>
<td>TPR</td>
<td>5.5</td>
<td>15.5</td>
</tr>
<tr>
<td></td>
<td>Latency increase</td>
<td>LR</td>
<td>1.77</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>Highest clock rate</td>
<td>GHz</td>
<td>25</td>
<td>25</td>
</tr>
<tr>
<td></td>
<td>Loss</td>
<td>dB</td>
<td>20</td>
<td>20</td>
</tr>
<tr>
<td>Result w/o pipeline</td>
<td>Highest clock rate</td>
<td>GHz</td>
<td>4.55</td>
<td>1.61</td>
</tr>
<tr>
<td></td>
<td>Loss</td>
<td>dB</td>
<td>180</td>
<td>580</td>
</tr>
</tbody>
</table>

Besides the loss and latency discussed above, there are other factors that should be considered in a real circuit. One example is the amplified spontaneous emission (ASE) from the amplifiers in ASL. As the number of amplifiers in Fig. 3 is increased, the optical signal quality measured by the signal noise ratio (SNR) will deteriorate due to the presence of noise amplified by each amplifier, which will put an upper boundary on the pipelining stage as well. Given the attenuation of each stage is $A$, the gain of each amplifier is $G$, and the ASE noise power of each stage is $P_{ASE}$, we can write the signal and noise output power as [53]:

$$P_s = P_{in} \prod_{i=1}^{N} A_i G_i, \quad (13)$$

$$P_n = P_{ASE} \sum_{j=1}^{N-1} \prod_{i=j}^{N-1} (A G_i + 1). \quad (14)$$

Assuming $A_i G_i = 1$, we have

$$SNR = \frac{P_s}{P_n} = \frac{P_{in}}{NP_{ASE}}. \quad (15)$$
which indicates that the SNR will decrease with respect to the stage number. With all these parameters considered, this equation will give a requirement for the laser input and the amplifiers. For example, in the case where the input power is 10dBm, the attenuation and gain of each stage are both 20dB, we know from [54] that the ASE noise power will be less than -30dBm. As a result, the SNR will still be larger than 25dB even after 30 stages, which indicates that the ASE noise will not be the bottleneck in this case. The optical SNR for the OEO is generally higher than that of the SOA because the SOA gives an ASE noise not found in the OEO.

7. Conclusion
We have proposed optical sequential logic and amplified sequential logic together with pipelining to tackle with the latency and loss issues that the electronic-photonic digital computing will face when its circuit size scales up to include numerous components in order to achieve much more complex functions. The analysis shows that, with the help of pipelining, the electronic-photonic circuits could have multiple times of computing throughput improvement at the cost of a very small latency increase. Pipelining and optical amplifiers also enable the circuits to operate at a much higher clock rate, while at the same time easing the problem of propagation loss. Sequential logic and pipelining open up the possibility of very-large-scale electronic-photonic digital computing circuits.

Acknowledgements
The authors acknowledge support from the Multidisciplinary University Research Initiative (MURI) program through the Air Force Office of Scientific Research (AFOSR) (Grant No. FA 9550-17-1-0071, FA9550-19-1-0341).

References


