FloPoCo is a command-line tool. The general syntax is

`flopoco <options> <operator specification list>`

FloPoCo will generate a single VHDL file (named by default `flopoco.vhdl`

) containing synthesisable descriptions of all the operators listed in `<operator specification list>`

, plus possibly sub-operators instanciated by them.
To use these operators in your design, just add this generated file to your project.

FloPoCo will also issue a report with useful information about the generated operators, such as the pipeline depth. In addition, three levels of verbosity are available.

To obtain a concise list of the available operators and options, simply type

`./flopoco `

`./flopoco IntConstMult wIn=16 c=12345`

produces a file `flopoco.vhdl`

containing a single operator for the integer multiplication of an input 16-bit number by the constant 12345.
The VHDL entity is named after the operator specification, here `IntConstMult_16_12345`

.

`./flopoco IntConstMult wIn=16 c=12345 IntConstMult wIn=16 c=54321`

produces a file `flopoco.vhdl`

containing two VHDL entities and their architectures, for the two given constant multipliers.

Several transversal options are available and will typically change the operators occuring after them in the list.

`target=Virtex5`

sets the target hardware family. For a list of supported families see the command line. We typically target the highest speed grade available for a family (see below for pipelining options).`frequency=300`

sets the target frequency (in MHz).-
`name=UserProvidedName`

replaces the (ugly and parameter-dependent) entity name generated by FloPoCo for the next operator. This allows in particular to change parameters while keeping the same entity name, so that these changes are transparent to the rest of the project. `plainVHDL=yes`

instructs FloPoCo to output concise and readable VHDL, using only + and * VHDL operators instead of FloPoCo adders and subtractors. This helps understanding the algorithms used by FloPoCo, but typically prevents or degrades automatic pipelining.`useHardMult=no`

instructs FloPoCo not to use hard multipliers or DSP block`unusedHardMultThreshold=0.3`

instructs FloPoCo to use a hard multiplier (or DSP block) if less than 30% of this hard multiplier are unused. The ratio is between 0 and 1, such that 0 means: any sub-multiplier that does not fully fill a DSP goes to logic; 1 means: any sub-multiplier, even very small ones, will consume a DSP

The FloPoCo distributions also includes useful programs for converting the binary string of a floating-point number to human-readable form
(`bin2fp`

) and back (`fp2bin`

).
The
`longacc2fp`

utility converts the fixed-point output of the LongAcc operator (see below) to human-readable form.

The floating-point format used in FloPoCo is identical to the one used in FPLibrary. It is inspired from the IEEE-754 standard.

An FP number is a bit vector consisting of 4 fields. From left to right:

- A 2-bit exception field
- 00 for zero, 01 for normal numbers, 10 for infinities, and 11 for NaN
- A sign bit
- 0 for positive, 1 for negative
- An exponent field on wE bits
- It is biased as in IEEE-754. The smallest possible FP numbers have exponent field 00...00, the FP number 1.0 has the exponent field 011...11 and the largest possible FP numbers have exponent 11...11
- A fraction field on wF bits
- The actual significand has an implicit leading 1, so the fraction field ff...ff represents the significand 1.ff...ff

The format is therefore parameterized by to positive integers wE and wF which define the sizes of the exponent and fraction fields respectively.

The utilities `fp2bin`

and `bin2fp`

will allow you to get familiar with the format and set up test benches.

There are two main differences between the format (wE=8, wF=23) and the IEEE-754 single precision format (the same holds for double).

Exceptional cases (zeroes, infinities and Not a Number or NaN) are encoded as separate bits in FloPoCo, instead of being encoded as special exponent values in IEEE-754. This saves quite a lot of decoding/encoding logic. The main drawback of this format is when results have to be stored in memory, where they consume two more bits. However, FPGA embedded memory can accomodate 36-bit data, so adding two bits to a 32-bit IEEE-754 format is harmless as long as data resides within the FPGA.

As a side effect, the exponent can take two more values in FloPoCo than in IEEE-754 (one for very large numbers, one for very small ones).

- FloPoCo does not support subnormal numbers.
Subnormals are quite expensive, requiring dedicated shifters and LZOcs. They are in the standard for two reasons:
- They provide a few more very small numbers and graceful accuracy degradation for very small values. However this really doesn't mean much: The few extra subnormals numbers only buy you a little delay, if your computation comes dangerously close to underflow.
- They allow the equivalence
`x-y=0 <=> x=y`

. With flush to zero as used in FloPoCo, x and y can be close enough that their difference is flushed to zero, while not being equal.

So the main motivation for subnormals was point 2: people have the tendency to assume for FP numbers the properties that hold for the reals. Subnormals buy us one such property (and a few others actually), and this was deemed worth the price for FPUs that would be placed in the hands of anybody. However, if you are reading this, you are not anybody.

If you are still not convinced, maybe you are right: please get in touch with us.

Note that anyway, FloPoCo provides conversion operators from and to IEEE-754 formats (single and double precision).

Numbers in the Logarithm Number System used in FloPoCo have an encoding similar to the floating-point format. It is also the same as the one used in FPLibrary.

Its fields are:

- A 2-bit exception field
- Same encoding as floating-point: 00 for zero, 01 for the general case, 10 for infinities, and 11 for NaN
- A sign bit
- 0 for positive, 1 for negative
- The integral part of the exponent on wE bits
- The fractional part of the exponent on wF bits
- The fixed-point exponent is encoded in two's-complement.

Reasonable values are 4 to 8 for wE, and 8 to 20 for wF. Other values are still allowed, including negative wE. Use at your own risk.

An operator may be combinatorial, or pipelined. A combinatorial operator has pipeline depth 0. An operator of pipeline depth 1 is obtained by inserting one and only one register on any path from an input to an output. Hopefully, this divides the critical path delay by almost 2. An operator of pipeline depth 2 is obtained by inserting two register levels, etc.

It should be noted that, according to this definition, pipelined operators usually do not directly buffer neither their inputs nor their outputs. For instance, connecting the input of a 400MHz operator to the output of another 400MHz operator may well lead to a circuit working at 200MHz only. It is the responsibility of the user or calling program to insert one more level of registers between two FloPoCo operators. This convention may be felt as a burden to the user, but it is the most sensible choice. It makes it possible to assemble sub-component without inserting registers in many situations, thus reducing the latency of complex components. Besides, different application contexts may have different policies (registers on output, or registers on input).

Two command-line options control the pipelining of the FloPoCo operators that follow them.

`pipeline=[yes|no]`

(default yes)-
Requires the operators to be pipelined. If
`no`

, the operator will be combinatorial. If`yes`

, registers may be inserted if needed to reach the target frequency. `frequency=[frequency in MHz]`

- Sets the target frequency. If the
`pipeline`

option is set, then FloPoCo will try to pipeline the operator to the given frequency. It will report a warning if it fails -- or if frequency-directed pipelining is not yet implemented for this operator.

The philosophy of FloPoCo's approach to pipelining is the following:

- FloPoCo's approach is to provide a fair estimate of the pipeline depth required to obtain a given frequency, and a sensible placement of registers.
- FloPoCo's pipelining effort is always tentative: You may not
get the frequency you asked (sometimes you will even get a
higher one). However, in such cases, increasing or decreasing
the target frequency should also increase or decrease the
obtained frequency. Note that you may do so on a per-operator
basis, as in:
`flopoco FPAdd frequency=200 wE=11 wF=53 FPMult frequency=300 wE=8 wF=23`

- If the obtained frequency is higher than needed, reducing
the
`-frequency`

option may save resources. - The pipeline built by FloPoCo may depend on the target. When
tuning it, we use the best possible speed grade for a given target
family, for insance -12 for Virtex-4. If you want to target a FPGA
with a lower speed grade, you may need to
update
`-frequency`

accordingly. - Better results will always be obtained by using retiming tools, which can work on a circuit netlist after technology mapping. The pipeline built by FloPoCo should help these retiming tools converge faster to a global optimal.

Note that not all operators support pipelining (utimately they all will). They are mentionned in the command-line help.