Projects: FFT Formal Proof v0 (old) - StanfordVLSI/Genesis2 GitHub Wiki
Caveat 1: The proof is obviously unfinished. I believe there is enough information in what's been written so far to properly construct the proof, but clearly I haven't yet made the time to do that myself. But I'm actively working on it! Come back later if you want to see the complete final proof. Or send me mail and I can notify you when it's done.
Caveat 2: The Stanford FFT generator, which I authored, may not yet actually use the algorithm that results directly from this proof. But I plan to do that eventually...!
A detailed description of the problem can be found in the 2013 VLSI-SoC paper
- An area-efficient minimum-time FFT schedule using single-ported memory, Richardson, Shacham, Markovic and Horowitz. (To appear in) Proceedings of the 21st IFIP/IEEE International Conference on VLSI (VLSI-SoC), 2013.
That is, suppose we have eight datapoints d0 through d7 that we want to transform from the time domain to the frequency domain. In Stage 0 of operation, a radix-2 butterfly unit reads datapoints d0 and d1, producing two results that then overwrite the original locations of d0 and d1 in memory. Similar operations occur for datapoint pairs (d2,d3), (d4,d5) and (d6,d7) respectively. In Stage 1, the datapoints are processed with stride-two pairings e.g. (d0,d2), (d4,d6), (d1,d3) and (d5,d7). And finally Stage 2 completes the operation, re-walking the data by stride 4. Table 1 summarizes this access pattern.
TABLE 1 Stage 0: (d0,d1) (d2,d3) (d4,d5) (d6,d7) Stage 1: (d0,d2) (d4,d6) (d1,d3) (d5,d7) Stage 2: (d0,d4) (d1,d5) (d2,d6) (d3,d7)
If the butterfly unit is pipelined such that result writebacks occur at the same time as operand reads, then datapoints (d0,d1) get written to memory at the same time as (d2,d3) are being read from memory. For memory to be able to supply all four datapoints at once, it needs to be four-ported. Alternatively, we could try to use two separate banks of two-ported memory or even four banks of single-ported memory. Of these three choices, four banks of single-ported memory is most efficient in terms of area, so we'd like to make that work if possible.
Now: Within any stage, look at any sequential group of four operands starting on a mod-two boundary. We'll call such a group a quad. If the quad happens to start on a mod-four boundary, we'll call this an aligned quad. Stage 0 has three overlapping quads: (d0,d1,d2,d3), (d2,d3,d4,d5) and (d4,d5,d6,d7), the outside two of which are aligned quads. Our goal is to map each datapoint to one of four memory banks (m0,m1,m2,m3) such that every quad contains exactly one of each bank number. For example, we could map datapoints (d0,d1,d2,d3,d4,d5,d6,d7) to memory banks (m0,m1,m2,m3,m0,m1,m2,m3) respectively, and thus fulfill our goal for each of the three quads in Stage 0:
- (d0,d1,d2,d3) maps to (m0,m1,m2,m3)
- (d2,d3,d4,d5) maps to (m2,m3,m0,m1)
- (d4,d5,d6,d7) maps to (m0,m1,m2,m3)
We're going to fulfill our goal by enforcing an order to each aligned quad, such that every aligned quad within a stage has the same memory bank ordering e.g. (m0,m1,m2,m3). This means that every unaligned quad will also have a consistent ordering, e.g. (m2,m3,m1,m0), and thus we meet our goal of uniqueness for all quads. This property will hold whether the transform is operating on eight datapoints, with two aligned quads per stage, or eight thousand datapoints, with two thousand quads per stage.
In theory the four operations in a given stage all happen simultaneously. In practice, it's not always practical to have enough butterfly units for that. So for instance, if we only have one butterfly unit, it will have to be used four times in succession to complete the transformation for a given stage, using the following four operations:
- read, process and writeback datapoints d0 and d1;
- read, process and writeback datapoints d2 and d3;
- read, process and writeback datapoints d4 and d5;
- read, process and writeback datapoints d6 and d7.
Insight 1: We can ensure that each of the four memory banks m0, m1, m2 and m3 are represented in every quad, in some order
Once again, here is our schedule for N=8 datapoints:
- Stage 0: (d0,d1) (d2,d3) (d4,d5) (d6,d7)
- Stage 1: (d0,d2) (d4,d6) (d1,d3) (d5,d7)
- Stage 2: (d0,d4) (d1,d5) (d2,d6) (d3,d7)
Given this information, we can assign banks to operands such that banknum = PoPe where Po is the parity of ALL odd bits and Pe is the parity of ALL even bits. This means that any given sequential block of four operands will map to four separate banks.
In our example quad [(d1,d3),(d5,d7)], or [(00001,00011),(00101,00111)]
- d1: Po(00001) = parity(0,0) = 0; Pe(00001) = parity(0,0,1) = 1; PoPe = 01; banknum m = 1
- d3: Po(00011) = parity(0,1) = 1; Pe(00011) = parity(0,0,1) = 1; PoPe = 11; banknum m = 3
- d5: Po(00101) = parity(0,0) = 0; Pe(00101) = parity(0,1,1) = 0; PoPe = 00; banknum m = 0
- d7: Po(00111) = parity(0,1) = 1; Pe(00111) = parity(0,1,1) = 0; PoPe = 10; banknum m = 2
Insight 2: We can reorder the quads such that each one contains the four memory banks m0, m1, m2 and m3 in a consistent order
When the stage number s is even, then Pe within any group of four will be either (0,1,0,1) or (1,0,1,0) depending on the (constant 0 or 1) parity of the non-toggling bits. Why? Because we know that bit ds within any group of four always counts (0,1,0,1) while the remaining (non-toggling) even bits remain constant.
Similarly, Po will be either (0,0,1,1) or (1,1,0,0). Bank number PoPe for a quad can thus be one of the following four groups: (00,01,10,11); (01,00,11,10); (10,11,00,01); or (11,10,01,00).
When the stage number is odd, we know that b(s) is an odd bit and thus it is Po that will be either (0,1,0,1) or (1,0,1,0) while Pe is (0,0,1,1) or (1,1,0,0). Bank number PoPe for a quad will thus be one of: (00,10,01,11); (01,11,00,10); (10,00,11,01); or (11,01,10,00).
Suppose, that, when stage number s is even and when Pe is going to count (1,0,1,0), we change ds (a component of Pe because s is even) such that it counts (1,0,1,0) instead of (0,1,0,1). In other words, instead of (d1,d3,d5,d7), where d1 goes (0,1,0,1) we change the sequence to (d5,d7,d1,d3), so that d1 goes (1,0,1,0). Is it okay to change the sequence like this? Yes it is because this is a valid reordering, as discussed above. And now Pe for the new ordering counts (0,1,0,1) instead of the original (1,0,1,0).
Further suppose that, when stage number s is even and when Po is about to count (1,1,0,0), we change ds+1 (a component of Po, because we know s+1 is odd) such that it counts (1,1,0,0) instead of (0,0,1,1). (In our example sequence (d1,d3,d5,d7) we would not need to make this change, because Po already counts the way we want it to.) In general, is it okay to change the sequence like this when needed? Yes it is because within any given group of four, the order of operations is unimportant; also the order of pairs within a given operation is unimportant. In other words, looking again at the last four operands in Stage 1 of our example above, the given order is (d1,d3)(d5,d7) but all EIGHT of the following possible sequences are equally valid:
* (d1,d3)(d5,d7); (d1,d3)(d7,d5); (d3,d1)(d5,d7); (d3,d1)(d7,d5); (d5,d7)(d1,d3); (d5,d7)(d3,d1); (d7,d5)(d1,d3); (d7,d5)(d3,d1).
Let's define Pi such that
- Pi = Pe when i is even, and
- Pi = Po when i is odd
In our example, (d1,d3,d5,d7) is part of an odd stage; Po=(0,1,0,1) and Pe=(1,1,0,0). Because Pe counts incorrectly (1,1,0,0) instead of (0,0,1,1), we replace b(s+1) with Pe to come up with a new ordering (d5,d7,d1,d3). Now Po=(0,1,0,1) and Pe=(0,0,1,1) and final bank number assignments read (00,10,01,11), which is the correct canonical order for an odd stage.
Does this make sense yet?
This is the core of the proof. However, we still need to place it in a formal context. Also, we must discuss:
- the special cases for the boundary condition when s and s+1 wrap around at the end, that is when s == S-1.
- how the proof extends to any power-of two values for B, N or r maybe;
- and, for advanced work, we should mention how the proof extends to any pipeline depth.
Note that our insight regarding quads changes slightly in the final stage of operation, when s = S-1. In all previous stages up to that point, the toggling bits are adjacent (see example below).
Stage 0 | Stage 1 | Stage 2 | Stage 3 |
---|---|---|---|
|
|
|
|
d0=0000 | d0=0000 | d0=0000 | d0=0000 |
d1=0001 | d2=0010 | d2=0100 | d4=1000 |
d2=0010 | d4=0100 | d4=1000 | d1=0001 |
d3=0011 | d6=0110 | d6=1100 | d5=1001 |
|
|
|
|
d4=0100 | d1=0001 | d1=0001 | d2=0010 |
d5=0101 | d3=0011 | d3=0101 | d6=1010 |
d6=0110 | d5=0101 | d5=1001 | d3=0011 |
d7=0111 | d7=0111 | d7=1101 | d7=1011 |
|
|
|
|
... | ... | ... | ... |
d7=1110 | d7=1101 | d7=1011 | d7=0111 |
d7=1111 | d7=1111 | d7=1111 | d7=1111 |
At the final stage s=S-1, however, the toggling bits wrap around, such that the toggling bits are the LSB and the MSB respectively, i.e. bits d0 and ds instead of the usual bits ds+1 and ds (because in this case ds+1 does not exist).
This wrap works fine if the number of bits is even, in which case d0 is naturally part of Pe and ds part of Po, same as in the case where the toggling bits are adjacent. Later we will discuss the case where then number of bits is odd.
Hmm. The derivation below is WRONG. It should go more like this:
Given
- a transform with N datapoints such that N is a power of two and N β₯ 16; and
- an even number of stages S, that is, S=log2(N) is an even number where N is the number of datapoints to be transformed (note that the number of address bits is also equal to S).
The following equation(s) transform a given stage's datapoint sequence
- (dp(N-1) dp(N-2)...dp1,dp0)
- (dp'(N-1) dp'(N-2)...dp'1,dp'0)
- (mb'(N-1) mb'(N-2)...mb'0,mb'1)
- (m0,m1,m2,m3, m0,m1,m2,m3 ... m0,m1,m2,m3),
So:
For each S-bit datapoint dps,i in a given transform stage s β (0..S-1),
- dps,i = dS-1dS-2...d1d0
- dp's,i = d'S-1d'S-2...d'1d'0
if (s==S-1) then | d'0 = P0 and d's = P1 | and all other d'j = dj , where j β (1..S-2) |
else | d's = Ps and d'(s+1) = P(s+1) | and all other d'j = dj , where j β (0..S-1) and j β {s,s+1} |
and
Pk = P0 = Peven | if k is even and |
Pk = P1 = Podd | if k is odd |
and
- P0 = Peven = Parity(dS-2,dS-4...d2,d0)
- P1 = Podd = Parity(dS-1,dS-3...d3,d1)
The equations work for
- a transform with N datapoints such that N is a power of two and N β₯ 16; and
- an even number of stages S, that is, S=log2(N) is an even number where N is the number of datapoints to be transformed (note that the number of address bits is also equal to S).
Datapoint d (decimal) |
Address (binary) = d3d2d1d0 |
P1,P0 |
Datapoint d' = Replace ds+1ds (d1d0) with P1P0 (reorder) |
P'1,P'0 | bank = P'1P'0 |
---|---|---|---|---|---|
d00 d01 d02 d03 |
0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 |
0,0 0,1 1,0 1,1 |
0 0 0 0 = d00 0 0 0 1 = d01 0 0 1 0 = d02 0 0 1 1 = d03 |
0,0 0,1 1,0 1,1 |
m0 m1 m2 m3 |
d04 d05 d06 d07 |
0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 |
0,1 0,0 1,1 1,0 |
0 1 0 1 = d05 0 1 0 0 = d04 0 1 1 1 = d07 0 1 1 0 = d06 |
0,0 0,1 1,0 1,1 |
m0 m1 m2 m3 |
d08 d09 d10 d11 |
1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 |
1,1 1,0 0,1 0,0 |
1 0 1 1 = d11 1 0 1 0 = d10 1 0 0 1 = d09 1 0 0 0 = d08 |
0,0 0,1 1,0 1,1 |
m0 m1 m2 m3 |
d12 d13 d14 d15 |
1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 |
1,0 1,1 0,0 0,1 |
1 1 1 0 = d14 1 1 1 1 = d15 1 1 0 0 = d12 1 1 0 1 = d13 |
0,0 0,1 1,0 1,1 |
m0 m1 m2 m3 |
If the number of bits is odd, we have a slight problem. Everything is fine as before from stage 0 up through the penultimate stage S-2. But then things change for the final stage S-1, where the toggling bits are the two outside bits dS-1 (fast toggle 0,1,0,1) and d0 (slow toggle 0,0,1,1). In the previous case, where there were an even number of address bits, we knew that dS-1 was odd and d0 was even, and thus Podd would toggle 0,1,0,1 (or 1,0,1,0, depending on the value of the non-toggling bits) while Peven would toggle 0,0,1,1 (or 1,1,0,0).
In the case where the number of bits is odd, however, the two outside (toggling) bits dS-1 and d0 are both even. Thus, as d0dS-1 count (00,01,10,11), Podd does not change at all (because the toggling bits are both even) while Peven counts either (0,1,1,0) (or (1,0,0,1), depending on the value of the non-toggling bits) instead of its usual 0-0-1-1 (or 1-1-0-0). Thus, if base the memory bank number on PoddPeven we only get two memory banks repeated twice, e.g. 00-01-00-01, instead of the desired mapping to four separate memory banks 00-01-10-11
Talk about what happens in final stage when s==nbits and nbits is odd.
Stage 0 | Stage 1 | Stage 2 |
---|---|---|
d0=000 | d0=000 | d0=000 |
d1=001 | d2=010 | d4=100 |
d2=010 | d4=100 | d1=001 |
d3=011 | d6=110 | d5=101 |
d4=100 | d1=001 | d2=010 |
d5=101 | d3=011 | d6=110 |
d6=110 | d5=101 | d3=011 |
d7=111 | d7=111 | d7=111 |
Try this: for stage s=S-1, instead of
- d_s = d_(S-1) = P_s = P_1 (because S is even and thus S-1 is odd)
- d_(s+1) = d_0 = P_(s+1) = P_0,
do this maybe
- d_s = d_(S-1) = xor(P_s,d_0) = xor(P_(S-1),d_0) = xor(P_0,d_0) (because S is odd and thus (S-1) is even) (also note xor(P_0,d_0) is same as P_0) (right?)
- d_(s+1) = d_S = d_1 (why? because d_(S-1)==d_0 sorta?) (because
- d_1 = xor(P_(s+1),d_0) = xor(P_1,d_0)
Equation for mapping addresses to banks for one butterly (B==1) at radix 2 (r==2)
Projects: FFT_Formal_Proof_-_Auxiliary_Info#Sketching_out_the_remainder_of_the_proof
Eventually we will find and show that, by extension, if we have
- an FFT with B butterflies (and thus M=4B memory banks);
- a transform with S=log2(N) stages such that modMbits(S)=0, where Mbits=log2 (M)
- (e.g. B=1βM=4βMbits=2 and modMbits(S)=mod2(S)=0βnumber of stages S is even);
everything from here down is wrong, i think; should probably instead copy and modify finished algorithm 1 from up above
The mapping of any given datapoint
- d =dS-1dS-2...d1d0
- m=mMbits-1mMbits-2...m1m0
- mMbits-1 = P(Mbits-1)/Mbits
- mMbits-2 = P(Mbits-2)/Mbits
- m1 = P1/Mbits
- m0 = P0/Mbits
- PMbits-1/Mbits = Parity(dS-1,dS-Mbits-1...d2Mbits-1,dMbits-1)
- PMbits-2/Mbits = Parity(dS-2,dS-Mbits-2...d2Mbits-2,dMbits-2)
- P1/Mbits = Parity(dS-(Mbits-1),dS-(2Mbits-1)...dMbits+1,d1)
- P0/Mbits = Parity(dS-(Mbits),dS-(2Mbits)...dMbits,d0)
We will try to use symbols consistently throughout this writeup, as described by this alphabetically-ordered reference table.
Italicized symbols are numbers, non-italicized symbols are names.
lower-case symbols are variables, UPPER-CASE symbols are constant (for a given FFT).
Symbol | Meaning | Notes |
---|---|---|
B | How many butterfly units |
To complete a transform for a given number of datapoints N>=8, an FFT with one butterfly unit (B=1) generally takes four times as long as an FFT with four butterfly units (B=4). |
di | Datapoint bit number | d0 is always the least-significant bit (LSB). Example: datapoint decimal d13 has four bits d3d2d1d0 = 1101 |
di | A specific datapoint | An eight-point transform has datapoints d0 through d7 |
i | Unknown integer | We usually start numbering from i=0 |
M | How many memory banks | For purposes of this proof M=4B |
m | Memory bank | Generally need four memory banks per butterfly e.g. m=0 through m=7 for two butterflies (B=2). |
mi | A specific memory bank | Generally need four memory banks per butterfly e.g. m0 through m7 for two butterflies (B=2). |
mi | Memory bank bit number | Generally need four memory banks per butterfly e.g. m2m1m0={000,001 ... 111} for two butterflies (B=2). |
Mbits | How many bits in each mem bank number m | Mbits = log2(M) |
N | Number of datapoints to transform | A power of two, e.g. 8, 16, 1024 or 8096 |
Po or Po | Parity of odd bits | Example: decimal 13 has four bits d3d2d1d0 = 1101; thus Po is parity(d3,d1) = parity(1,0) = 1 |
Pe or Pe | Parity of even bits | Example: decimal 13 has four bits d3d2d1d0 = 1101; thus Po is parity(d2,d0) = parity(1,1) = 0 |
R | Radix | Default is 2 |
S | Number of stages in a transform | An eight-point transform has three stages, so S=3 |
s | Stage number | An eight-point transform has three stages s=0, s=1 and s=2. |
For technical reasons (laziness actually), italics and subscripts might occasionally be dropped, e.g. you'll see d0 instead of d0 etc. I'll hope to fix this later...
Nomenclature section should be referenced way up front at the beginning.
OLD: Establish meaning of symbols s (stage number), n_s (how many stages), b_i (bit number i, where b_0 is LSB and b_S is MSB maybe), n_b, r, n_B etc.
Wanna be done by FRIIIIIDAYYYYYY! Not gonna happen.
What will be done by Friday:
- checkoff list for what will be done when (see below)
Target Actual Finish Finish Task --------- --------- ---------------------------------------------- Tue 11/19 Tue 11/19 Move the old stuff to an auxiliary page. DONE! ... Tue 11/19 Tue 11/19 Clean up the outline, figure out where to put the "nomenclature" section (see below) ... Tue 11/19 xxx xx/xx Build a timeline for what gets done when, working like twelve hours a week or whatever. ... Tue 11/19 Tue 11/19 Start on nomenclature section: what is N, n, N_B etc. ... Thu 11/19 Tue 11/19 Finish nomenclature section: what is N, n, N_B etc. ... Thu 11/19 xxx xx/xx Continue with the proof, in order of sections not finished ... TBD List of sections, including when each one will be done ... xxx xx/xx xxx xx/xx Section 5 xxx xx/xx xxx xx/xx 5.1 The equation so far xxx xx/xx xxx xx/xx 5.2 Boundary condition, odd number of bits xxx xx/xx xxx xx/xx 5.3 Boundary condition, even number of bits xxx xx/xx xxx xx/xx 5.4 Final equation for r==2 and B==1 xxx xx/xx xxx xx/xx Final polish of section 5 ... xxx xx/xx xxx xx/xx Section 6 xxx xx/xx xxx xx/xx 6.1 Extension to more than one butterfly unit (B > 1). xxx xx/xx xxx xx/xx 6.2 Extension to radix r > 2. xxx xx/xx xxx xx/xx 6.3 Extension to longer pipelines.
Later: replace html page with a latex page for publication.
How to sign up for notification of when the proof is finished.
Send me mail, I'll put you on a list and notify you when the proof is done (link to profile page on vlsiwebsite). Or maybe look at the schedule (link to schedule section above/below).
Go back and recheck entire proof, see if it is consistent with nomenclature section 9.
Make sure FFT generator (correctly) uses algorithm described in the formal proof!
Hand-written notes where the proof originally was developed:
https://vlsiweb.stanford.edu/mediawiki/index.php/File:Formal_proof_notes.pdf
That algorithm as implemented in the Perl test program:
############################################################################ # E.g. for 8 datapoints # stage 0 ($op1,$op2) will be {(0,1),(2,3),(4,5),(6,7)} maybe # stage 1 ($op1,$op2) will be {(0,2),(4,6),(1,3),(5,7)} maybe # stage 2 ($op1,$op2) will be {(0,4),(1,5),(2,6),(3,7)} maybe # Normal treatment for stages 0 through nstages-2 if ($s != ($nstages-1)) { $op1 = smart_replace($op1, $s); $op1 = smart_replace($op1, $s+1); $op2 = smart_replace($op2, $s); $op2 = smart_replace($op2, $s+1); } # Last stage nstages-1 needs special treatment. elsif ($s%2 == 1) { # Even number of bits; use zero instead of s+1 maybe. $op1 = smart_replace($op1, $s); $op1 = smart_replace($op1, 0); $op2 = smart_replace($op2, $s); $op2 = smart_replace($op2, 0); } else { # Odd number of bits; gotta do sumpm crazy maybe. $op1 = weird_replace($op1, $s); $op2 = weird_replace($op2, $s); } # Swizzle to assign correct bank & row to $op1, $op2 my ($b1, $r1) = swizzler::swizzle($op1, $npoints, 4*$nunits); my ($b2, $r2) = swizzler::swizzle($op2, $npoints, 4*$nunits); ############################################################################## sub smart_replace { # In the given number $n, replace bit $i with new bit $b # If $i is odd, new bit $b is the parity of the odd bits of $n # If $i is even, new bit $b is the parity of the even bits of $n # my $n = shift; my $i = shift; my $newbit = $i%2 ? parity_odd($n) : parity_even($n); $n = replace_bit($n, $i, $newbit); # Put $newbit into $n at position $i return $n; } sub weird_replace { # Given number $n and ultimate stage $s, where $s is even, do some weird $#!? my $n = shift; my $s = shift; # This is supposed to work at least for when nbits(datapoints) is odd. i think. my $b0 = ($n & 0x1); my $b1 = ($n & 0x2) >> 1; my $new_top_bit = parity_even($n) ^ $b1; my $new_b1 = parity_odd($n) ^ $b1 ^ $b0; my $new_b0 = $b1 ^ $b0; # Complicated enough yet!? $n = replace_bit($n,$s,$new_top_bit); # Put $new_top_bit into $n at position $s $n = replace_bit($n, 1,$new_b1); # Put $new_b1 into $n at position 1 $n = replace_bit($n, 0,$new_b0); # Put $new_b0 into $n at position 0 (LSB right?) return $n; }
Projects: FFT Formal Proof v0 (old)
Projects: FFT Formal Proof - Auxiliary Info
Projects: FFT_Formal_Proof_-_Auxiliary_Info#Trash
[Summarize]
NEXT: ADD LINK TO TAVALA PAPER HERE FOR REFERENCE
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1205957
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1205957&tag=1
http://genesis2.stanford.edu/mediawiki/index.php/File:Takala_equations.jpg
Datapoint | Address (binary) | P1,P0 | bank |
---|---|---|---|
d00 d01 d02 d03 |
0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 |
0,0 0,1 1,0 1,1 |
m0 m1 m2 m3 |
d04 d05 d06 d07 |
0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 |
0,1 0,0 1,1 1,0 |
m1 m0 m3 m2 |
d08 d09 d10 d11 |
1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 |
1,1 1,0 0,1 0,0 |
m3 m2 m1 m0 |
d12 d13 d14 d15 |
1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 |
1,0 1,1 0,0 0,1 |
m2 m3 m0 m1 |
where
- if (s==S-1) then d'_0 = P_0 and d'_s = P_1 and all other d'_j = d_j
- else d'_s = P_s and d'_(s+1) = P_(s+1) and all other d'_j = d_j
Given a transform with an even number of stages S, that is, S=log2(N) is an even number where N is the number of datapoints to be transformed (note that the number of address bits is also equal to S); for one butterfly (B=1) we should have four memory banks (M=4) numbered 00 to 11 i.e. m1m0={00,01,10,00}. The mapping of any given datapoint
d=dS-1dS-2...d1d0
- m=m1m0
- m1 = P1 = Podd
- m0 = P0 = Peven
The equations thus work for number of datapoints N=16,64,256... and thus log(N)=4,6,8... (even) but not for N=8,32,128...where log(N)=3,5,7... is odd.
need to define Pi for the range i=0 to S and number of bits N
then we can say
where
- if (j==s) then d'j = Pj
- else d'j = dj
- if (j==S-1) then Pj = PS-1
- dp'i = d'S-1d'S-2...d'1d'0
- if s==i and s==S-1 then i is odd and di = Ps = P1 = Podd and d0 = Ps = P0 = Peven
- else if s==i and s!=S-1) then di = Ps and di+1 = P's+1