This is an old revision of the document!

Baseboard4 Developer's Guide

Introduction

This document describes how create new software defined peripherals for the Baseboard4 using the Verilog hardware description language. To appreciate this document you should be comfortable with digital design and with the Verilog hardware description language. Sections of this document describe:

how to get started with Verilog,
how to create a new Wishbone peripheral,
how the DPCore build system works,
how to add a new peripheral driver module

How to Get Started with Verilog

In this section you will see how to use Verilog to build FPGA applications. The purpose of this section is to give non-Verilog users a sense of how Verilog works. This section assumes you are already familiar with digital circuit design. This section is broken into four topics:

“hello world” in Verilog
use iverilog to test your Verilog circuit
install the Xilinx compiler
compile your design and test it on the Baseboard

"Hello World" in Verilog

Most programming languages have a sample application that prints the phrase “Hello, World!” to the console. This application is often used to validate the installation of the language and its tool chain. For microcontrollers and FPGAs the equivalent application usually flashes an LED on the development board. The Verilog program below implements a counter on the Baseboard LEDs. Save the following as counter.v

  // Simple up counter for the Baseboard.
  // Visible update rate is about 12 times per second
  
  module counter(CK12, LEDS);
      input   CK12;        // 12.5 MHz input clock
      output  [7:0] LEDS;  // eight LEDs on Baseboard
  
      reg [27:0] count;    // 28 bit counter
  
      initial
      begin
          count = 0;
      end
  
      always @(posedge CK12)
      begin
          count <= count + 28'b1;
      end
  
      assign LEDS = count[27:20];   // display high 8 bits of counter
  
  endmodule

Have you ever seen a circuit board with gold or copper fingers for the connector? If so, you are already familiar with the idea of a module. You can think of Verilog modules as a complete circuit boards with the names of the connector pins given in the module definition. The Top Module connects to the FPGA pins directly and it is to the top module that the other modules connect.

Think of this counter module as a circuit board with nine signal pins on its edge. In this case there is only one module, so the counter module is the top module.

  module counter(CK12, LEDS);
      input   CK12;        // 12.5 MHz input clock
      output  [7:0] LEDS;  // eight LEDs on Baseboard

Clearly the code inside the module describes the digital circuitry on our imaginary circuit board.

You already know that a register is just an array of flip-flops. In Verilog you create a register using the reg keyword. You have to tell the compiler how many flip-flops you want in the register by specifying the upper and lower flip-flop numbers.

      reg [27:0] count;    // a 28 bit counter

You can tell the Verilog compiler what values to place in registers when the FPGA is loaded. This is called the initial value of the register. Do this with the initial construct.

      initial
      begin
          count = 0;
      end

Consider a flip-flop. It has an input and an output. In Verilog inputs always appear on the left hand side of an assignment and outputs always appear on the right hand side. While not obvious, this is true for the following:

      always @(posedge CK12)
      begin
          count <= count + 28'b1;
      end

The left hand side is the input to the count register and the right hand side is the output of count plus one. The output of an edge-triggered flip-flop is given a value only on the edge of its input clock. Register assignments must appear inside a block that defines the clock source for the registers. That is what the always @(posedge CK12) line does. Verilog uses a special syntax to show an edge-triggered flip-flop assignment. That is the <= syntax. This can only appear in a block with a clock source. Some flip-flop schematic symbols use a tiny triangle at the clock input. You can think of the < in <= as that clock symbol.

Assignment outside of a synchronous block is done with just an equal sign. This is called a continuous assignment and is how you connect one module to another and how you connect inputs and outputs. Continuous assignments are also handy for giving a simple name to a complex piece of logic. In the counter application the line

      assign LEDS = count[27:20];   // display high 8 bits of counter

sets the value of LEDS to the high eight bits of the counter. Just as you should not drive a wire with two different output, so Verilog wants just one output driving a wire or input. The following will generate a Verilog compiler error.

  assign outputA = inputX;
  assign outputA = count + 1;

If you have enjoyed this introduction you may want to get more information from one of the many books and on-line tutorials for Verilog. The Wikipedia page is both simple and fairly complete (https://en.wikipedia.org/wiki/Verilog). While the compiler can flag many errors it can not identify logic errors in your design. The easiest way to spot logic errors is to use a simulator that lets you look at each signal in the circuit.

Test Your Verilog Design Using Iverilog

The word “Verilog” is a combination of the words verification and logic. It was originally a hardware describpiton language intended as a simulation tool to test and verify circuits. Only later was it used as input for circuit systhesis. Most commercial Verilog compilers include a simulation tool. In this section you will see how to use the open source Icarius Verilog (http://iverilog.icarus.com) to simulate your counter.

The simulation environment for a circuit is called a test bench. Recall how you were asked to think of a module as circuit board. Think of a test bench as a motherboard that into which you plug your module. This motherboard will have drive all the inputs on your device under test and be able to change those inputs based on how many clock cycles have passed. You can view all of the internal signals and the output signals with the output of the simulation.

Install iverilog and gtkwave on a Debian system with the command:

  sudo apt-get install iverilog gtkwave

Save the following as counter_tb.v

  //  iverilog test bench for the simple counter in counter.v
  
  `timescale 10ns/10ns
  
  module counter_tb;
      // direction is relative to the DUT
      reg    clk;          // 12.5 MHz system clock
      wire   [7:0] leds;   // LEDs on Baseboard
  
      // Add the device under test
      counter counter_dut(clk, leds);
  
      // generate the clock
      initial  clk = 0;
      always   #4 clk = ~clk;  // half period is 40ns == 4 * timescale
  
      initial
      begin
          $dumpfile ("counter_tb.xt2");
          $dumpvars (0, counter_tb);
  
          // 100 million steps of 10ns is one second
          #100000000
          $finish;
      end
  endmodule

Run the simulation, convert the output to a gtkwave format, and display the results with the commands:

  iverilog -o counter_tb.vvp counter_tb.v counter.v
  vvp counter_tb.vvp -lxt2
  gtkwave counter_tb.xt2

To view the LED waveforms click on “counter_tb” and “counter_dut” in the top left gtkview pane. Then click on “LEDS” in the lower left pane. Double click on “LEDS” in the display pane to expand the eight lines. Hold down the CTRL key and use the mouse scroll wheel to compress the display until the whole second of simulation is displayed. The display should look something like this:

Install and Test the Xilinx Toolchain

Once your simulation output is correct you are ready to compile and download your design to the FPGA. This section describes how to install the Xilinx FPGA design tools, how to use the Xilinx command line tools to compile a Verilog design, and how to download the compiled code to the Baseboard. A later section will describe how to automate all these steps in a Makefile.

The Baseboard uses a Xilinx Spartan-3E and a USB interface for both downloads and a host interface. Since the Baseboard is downloaded through a USB serial port you do not need a JTAG cable or dongle.

Xilinx provides a set of free design tools, ISE, which are part of their WebPACK download. To get the WebPack download you have to select it, register with Xilinx, and start the download.

Start by going to the Xilinx download site at: http://www.xilinx.com/support/download/index.htm. Click on “ISE Archive” link and select “14.7” and then “Full Installer for Linux”. This will take you to a login page where you can select “Create Account” (since you probably don't already have a Xilinx account). You activate the account using a token sent in email. Your first login will present a page asking you to verify your name and address. The download starts automatically after selecting Download at the bottom of the name verification page.

Install the software by untarring the download file and running the “xsetup” script in the top level directory. If installing as a non-root user, you might want to create /opt/Xilinx/14.7 beforehand and give yourself write permission on it. You should be able to install ISE in a virtual machine but it might not install correctly in a docker image.

The installation will ask which products to install. We suggest the “ISE WebPACK” as it is the smallest and has everything you'll need. You need to “Acquire or Manage a License Key” but you do need to install the Cable Drivers. Selecting Next then Install should start the installation.

Once the installation is complete you can add the Xilinx Verilog compiler toolchain to you path and verify that it can be found with the commands:

  export PATH=$PATH:/opt/Xilinx/14.7/ISE_DS/ISE/bin/lin64
  which xst

By default, ise opens a graphical integrated development environment. DPcore is make based and you do not need to learn the IDE. You may recall that compiling a C++ or C program is broken into the steps of preprocessing, compiler pass 1, compiler pass 2, assembly, and linking. All these steps occur even though you only type g++ or gcc. In the same way, Verilog is compiled to binary in several steps.

Before compiling your Verilog to an FPGA binary you need to tell the compiler how the wires in the Verilog module map to the physical FPGA pins. Xilinx uses a “user constraints file” (.ucf) for this. The minimum UCF file for your counter is shown below. Save it as counter.ucf

  NET "CK12"      LOC = "P39"  ;    # 12.5 MHz clock
  NET "LEDS[0]"   LOC = "P70"  ;    # LED 0
  NET "LEDS[1]"   LOC = "P71"  ;    # LED 1
  NET "LEDS[2]"   LOC = "P62"  ;    # LED 2
  NET "LEDS[3]"   LOC = "P66"  ;    # LED 3
  NET "LEDS[4]"   LOC = "P67"  ;    # LED 4
  NET "LEDS[5]"   LOC = "P68"  ;    # LED 5
  NET "LEDS[6]"   LOC = "P63"  ;    # LED 6
  NET "LEDS[7]"   LOC = "P65"  ;    # LED 7

The commands that Xilinx uses to compile Verilog for a SPartan3 can be hidden by a Makefile but you might be interested in the steps involved. There is insufficient space in this tutorial to give detailed descriptions of the commands. Your download of the Xilinx tools includes comprehensive manuals for the Xilinx command line tools which you can consult if you are interested. Look in ISE/doc/usenglish/books/docs/. The following paragraphs give a brief overview of the commands involved.

The first command, xst, synthesizes the Verilog file into a hardware design that is saved as a netlist file with an .ngc extension. Xilinx's xst program is actually a command line interpreter and it expects input from standard-in. Use an echo command and a pipe operator to give xst input from standard-in if you want to keep all of your build information in a Makefile.

  echo "run -ifn counter.v -ifmt Verilog -ofn counter.ngc -p xc3s100e-4-vq100" | xst

You have to specify the input file, the input file format, the name of the output file and the exact type of FPGA. Xst generates several report files and directories, but the real output is a netlist file with an .ngc extension that is required for the next command. You can examine the output files and reports to better understand the how the synthesis works and an appendix in the xst manual describes the output files and reports in detail.

The ngdbuild command further decomposes the design into FPGA native elements such as flip-flops, gates, and RAM blocks.

  ngdbuild  -p xc3s100e-4-vq100 -uc counter.ucf  counter.ngc

It is the ngdbuild command that first considers the pin location, loading, and timing requirements specified in the user constraints file, counter.ucf. Like the other Xilinx commands, ngdbuild produces several reports but its real output is a “Native Generic Database” stored in a .ngd file.

The Xilinx map command converts the generic elements from the step above to the elements specific to the target FPGA. It also performs a design rules check on the overall design. The map command produces two files, a Physical Constraints File file and a Native Circuit Description file, that are used in subsequent commands.

  map -detail -pr b counter.ngd

The map command produces quite a few reports. As you gain experience with FPGA design you may come to rely on these report to help identify design and timing problems.

The place and route command (par) uses the Physical Constraints File and the Native Circuit Description to produce another Native Circuit Description file which contains the fully routed FPGA design.

  par counter.ncd parout.ncd counter.pcf

Output processing starts with the bitgen program which converts the fully routed FPGA design into the pattern of configuration bits found in the FPGA after download.

  bitgen -g StartUpClk:CClk -g CRC:Enable parout.ncd counter.bit counter.pcf

The bitgen program lets you specify which clock pin to use during initialization and whether or not to generate a CRC checksum on the download image. Files which contain a raw FPGA download pattern are called bitstream files and traditionally has a .bit file extension. Bitstream files are good for downloads using JTAG but since we're downloading over a USB serial connection one more command is required to convert the bitstream file into a download file.

  promgen -w -p bin -o counter.bin -u 0 counter.bit

The promgen program is a utility that converts bitstream files into various PROM file formats. The format for the Baseboard is called bin so the promgen command uses the -p bin option. The output of promgen, counter.bin, is what you download to the Baseboard FPGA card.

All of the commands described above, including xst, ngdbuild, map, par, bitgen, and promgen have excellent PDF manuals in either the ISE/doc/usenglish/books/docs/xst directory or the ISE/doc/usgnglish/de/dev directory of your WebPACK installation.

  echo "run -ifn counter.v -ifmt Verilog -ofn counter.ngc -p xc3s100e-4-vq100" | xst
  ngdbuild  -p xc3s100e-4-vq100 -uc counter.ucf  counter.ngc
  map -detail -pr b counter.ngd
  par counter.ncd parout.ncd counter.pcf
  bitgen -g StartUpClk:CClk -g CRC:Enable parout.ncd counter.bit counter.pcf
  promgen -w -p bin -o counter.bin -u 0 counter.bit

Download Your Design to the Baseboard

When the Baseboard powers up or after pressing the reset button the FPGA waits for an binary image from the serial port. Linux serial port drivers can suppress certain characters from an output stream. To prevent this you need to turn off post processing on the serial port.with the commands:

  sudo addgroup $LOGNAME dialout
  stty --file=/dev/ttyUSB0 -opost  # We want raw output

Press the reset button and send the FPGA binary to the Baseboard with the command:

  cat counter.bin > /dev/ttyUSB0

If all has gone well you should see an up counter on the Baseboard LEDs.

How to Write a Wishbone Peripheral

In this section you will see how to build your own custom Verilog peripheral. To appreciate this section you should already be familiar with digital circuit design and Verilog. This section is broken into three topics:

the DPcore Wishbone bus
clone an existing peripheral and rebuild DPCore
design tips for a DPCore peripheral
debug your peripheral with iverilog

The DPCore Wishbone Bus

A Wishbone Bus is a synchronous, parallel data bus intended to connect on-chip peripherals to an on-chip CPU. Wishbone describes both the interface signals to the peripherals as well as the how the peripherals are connected to each other and to the CPU. The full specification is at Wishbone specification. Wishbone is a common interface for many of the project at Opencores.

In the case of DPcore, the Wishbone bus does not connect to a CPU but to an interface to a host computer.

Wishbone supports different peripherals/CPU interconnect topologies. You may already be familiar with a shared bus topology since early PCs used these as the ISA and PCI buses. A crossbar topology is often used when peripherals need to communicate amongst themselves or with a DMA controller. A point-to-point topology is often used when the bandwidth requirements of a peripheral would interfere with access to other peripherals. A ring topology is often used when speed is less important than the amount of FPGA fabric used in the system. DPcore uses a ring topology. Note that the topology does not necessarily affect the address, data, and control lines going to and from the peripheral. The diagram to the right shows the major Wishbone signals in a point-to-point topology.

Wishbone gives a general description of a peripheral bus. For example, Wishbone buses can be 8, 16, 32, or 64 bits wide. It is up the the implementer to decide things like bus width, clock frequencies, and which controls lines to use. The Wishbone specification lists and defines both required and optional bus signals.

The diagram to the right shows the topology for PDcore. It shows two of the possible sixteen peripherals. The DPcore data bus is 8 bits wide. Each peripheral has 8 bits of internal addressing. That is, each peripheral can have up to 256 8-bit registers. You have previously seen that one advantage of DPcore is that you can have any mix of peripherals you want. This diagram illustrates why. All peripherals have the same interface, so any peripheral can be substituted for any other.

The paragraphs below describe the Wishbone bus as implemented for DPcore. We use _X to indicate both input (_I) and output (_O) signals. Instead of the terms Master and Slave we use the term Controller and Peripheral which better match our use of Wishbone. In our implementation when a peripheral is not selected it must route DAT_I to DAT_O unchanged–.

Peripheral Signal Names : CLK_I : System clock. All peripherals use this 20 MHz clock to drive state machines and other peripheral logic. This is used by the controller and all peripherals.

WE_I : Write enable. This is set to indicate a register write into the peripheral. A zero for WE_I indicates a read operation.

STB_I : Strobe. This is set to indicate that a bus cycle to this peripheral is in progress. The cycle can be either a register read/write or a poll.

TGA_I : Address tag. A bus cycle with TGA_I set is a normal register read/write. For a read bus cycle with TGA_I cleared, the peripheral places the number of bytes it wishes to send to the host on DAT_O. A DAT_O value of zero indicates that the peripheral has no data for the host at this time. If DAT_O is non-zero the controller internally generates a read request for the number of bytes specified.

ADR_I : Address. An 8 bit address that specifies which register in the peripheral to read or write. The peripheral can treat some addresses as simple register reads/writes and other addresses as top-of-stack for a FIFO.

STALL_O : Stalled. The peripheral asserts this signal to indicate that more system clock cycles are needed to complete the bus cycle. The controller waits for STALL_O to be deasserted before completing the read or write operation.

ACK_O : Acknowledge. The peripheral asserts ACK_O to tell the controller that the read or write bus cycle has successfully completed. This signal is used in FIFO accesses to indicate that a FIFO is full (on write) or empty (on read). The controller write successive bytes to the same address to fill a FIFO. As long as the bytes are successfully written, the peripheral asserts ACK_O. When a byte can not be written, the peripheral does not raise ACK and the controller knows that the FIFO is full and the sequence of writes stops at that point. The controller sends an acknowledgment to the host giving the number of bytes written (or read). This lets the host application know how many bytes were successfully written to the FIFO letting the application resend the unacknowledged bytes at a later time.

DAT_X : An 8 bit data bus that is passed in ring from the bus controller through all peripherals and back to the bus controller. This arrangement is close to the Wishbone Data Flow Interconnection but the data path is a ring. This arrangement is sometime called a “serpentine” bus. The “Port Size” is 8 bits and the “Granularity” is 8 bits. There is no “Endianness” associated with the data bus. During a bus write cycle the peripheral latches DAT_I into the selected register. During a read bus cycle the peripheral ignore DAT_I and places the requested data on DAT_O.

The Verilog code fragment below shows a typical peripheral interface definition. “Clocks” are system available strobes that occur every 100ns, 1.0us, 10us, 100us, 1.0ms, 10ms, 100ms, and 1 second. The four inout pins go to the FPGA pins. Some peripherals have eight instead of four FPGA pins.

module dp_peri(CLK_I,WE_I,TGA_I,STB_I,ADR_I,STALL_O,ACK_O,DAT_I,DAT_O,clocks,pins);
    input  CLK_I;            // system clock
    input  WE_I;             // direction. Read-from-peri==0; Write-to-peri==1
    input  TGA_I;            // ==1 for register read/write, ==0 for data-to-send poll
    input  STB_I;            // ==1 if peri is addressed for r/w or poll
    input  [7:0] ADR_I;      // address of target register
    output STALL_O;          // ==1 if we need more clk cycles to complete
    output ACK_O;            // ==1 if we claim the address and complete the read/write
    input  [7:0] DAT_I;      // Data INto the peripheral;
    output [7:0] DAT_O;      // Data OUTput from the peripheral, = DAT_I if not us.
    input  [7:0] clocks;     // 100ns to 1 second pulses synchronous CLK_I
    inout  [3:0] pins;       // FPGA pins for this peripheral

The DPcore implementation of Wishbone is fairly bare-bones. That is, it does not use other Wishbone signals such as: RST_I, TGD_I, TGD_O, CYC_I, ERR_O, LOCK_I, RTY_O, SEL_I, or TGC_I.

Clone an Existing Peripheral

It should be no surprise that the easiest way to build a new peripherals is to base it on an existing one. This section shows you how to do this.

Start with a working system built from source. Download the source code for DPcore and build a binary image with the following commands:

  git clone https://github.com/DemandPeripherals/DPCore.git
  cd DPCore/src
  # Edit perilist to set all peripherals to your new one
  vi perilist
  make
  sudo cp DPCore.bin /usr/local/lib

Expect several warnings about signals without loads. This happens because some pins are defined but never used. Now build the API daemon. The

  git clone https://github.com/DemandPeripherals/dpdaemon.git
  cd dpdaemon
  make
  sudo make install
  # start dpdaemon and test Baseboard LEDs
  # (use sudo for the following if not in dialout group)
  dpdaemon -l /usr/local/lib/DPCore.bin -s /dev/ttyUSB0
  dpset bb4io leds 55

With everything built from source, you can now start adding your own code. Move to the DPCore.src directory and copy gpio4.v to myperi.v, where you can replace “myperi” with the name for your new peripheral. Edit the file to change all references of “gpio4” to “myperi”. Edit buildmain.c and clone the line for gpio4. So that

  {"gpio4", "gpio4", 0xf, 4 },

becomes

  {"gpio4", "gpio4", 0xf, 4 },
  {"myperi", "myperi", 0xf, 4 },

Edit perilist and replace all of the peripherals with your new peripheral name. The promise of DPCore is “any peripheral in any slot”, which implies that no peripheral is allowed more than its fair share of FPGA fabric. That is why you should fill perilist with your new peripheral. Rebuild DPCore.bin with a make and again copy DPCore.bin to /usr/local/lib.

Next is the myperi driver. Move to the dpdaemon/fpga-drivers directory and copy the gpio4 directory to myperi.

  cp -r gpio4 myperi

Change the name of the driver file and change the target name in the Makefile.

  mv myperi/gpio4.c
  vi myperi/Makefile

While not strictly required, this is a good time to edit myperi.c and change the name of the peripheral. The line with the peripheral name should now look something like:

  pslot->name = "myperi";

Build, install, and run dpdaemon as you did earlier. Be sure to kill any running instances of dpdaemon before starting a new instance. Use sudo to run dpdaemon or add yourself to the dialout group.

  cd dpdaemon
  make
  sudo make install
  dpdaemon -l /usr/local/lib/DPCore.bin
  dplist

If all has gone well the list of peripherals should now include your new peripheral name. This might a good time to do a git commit.

Design Tips for a DPCore Peripheral

This guide can not give you specific advice about your new peripheral but we can give some tips for its design and coding.

Your Verilog design actually starts with the driver and its API. Try to design the resources in the API to match how you view the peripheral at a high level. Your design goal for the driver is to put as much logic into it as possible so that the FPGA part of the peripheral can be as small and simple as possible. Once you've got a view of what the driver and Verilog each do, you can define the registers that link the driver to the FPGA logic. It is important to document the meaning, limits, and suggested use of the registers at the top of your Verilog file. This will help you maintain the code when you come back to it months or years later.

The module declaration for most DPCore peripherals look the same.

  module myperi(CLK_I,WE_I,TGA_I,STB_I,ADR_I,STALL_O,ACK_O,DAT_I,DAT_O,clocks,pins);
      input  CLK_I;         // system clock
      input  WE_I;          // direction of this transfer. Read=0; Write=1
      input  TGA_I;         // ==1 if reg access, ==0 if poll
      input  STB_I;         // ==1 if this peri is being addressed
      input  [7:0] ADR_I;   // address of target register
      output STALL_O;       // ==1 if we need more clk cycles to complete
      output ACK_O;         // ==1 if we claim the above address
      input  [7:0] DAT_I;   // Data IN to the peripheral;
      output [7:0] DAT_O;   // Data OUT from the peripheral, = DAT_I if not us.
      input  [7:0] clocks;  // Array of clock pulses from 100ns to 1 second
      inout  [3:0] pins;    // Lines out to FPGA pins

After the module declaration you'll want to add wires and registers specific to your peripheral. All DPCore Wishbone peripherals implement a Mealy-Moore state machine (https://en.wikipedia.org/wiki/Moore_machine). When you write your Verilog be aware that you are implementing a state machine. The absolute best thing you can do for your future self or others reading your code is to describe in some detail the meaning of the registers you use and how they help implement your state machine. The design of your peripheral is all about the state machine it implements.

Timers and timing are common in peripherals. If your state machine is partly based on timing you might expect code something like the following:

  if ((mystate == `MYSTATE_A) && (clocks[`M10CLK]))
  begin
      if (polltmr == 0)
      begin
          mystate <= `MYSTATE_B;  // go to (describe next state)
          polltmr <= 'STB_TMR;    // init timer for state B
      end
      else
          polltmr <= polltmr - 4'h1;
      end
  end
  else if (mystate == 'MYSTATE_B)
  begin
      ......

Both case statements and if / else if constructs are good for switching on state registers.

Outputs are often based on the state of the peripheral. For example, to get a 10 millisecond pulse on pins[0] at the end of MYSTATE_A you would use Verilog something like:

  assign pin[0] = ((mystate == `MYSTATE_A) & (polltmr == 0));

Auto Send: The DPCore bus controller continuously polls each peripheral in turn to ask if the peripheral has data for the host. If the peripheral has data to send the bus controller builds a read request request packet, performs the bus read cycles, and sends the data to the host. This feature is called “auto send” and removes the need for an interrupt line to the host.

The bus controller uses the Wishbone line TGA_I to select a normal read/write cycle or a auto send poll. The code below shows how an internal flag, dataready, triggers an auto send packet.

  assign DAT_O = (~myaddr) ? DAT_I :
                 (~TGA_I && myaddr && (dataready)) ? 8'h08 :
                 (TGA_I) ? {5'h00,rout} :
                 8'h00 ;

In the above code you can see that data out equals data in if the peripheral is not selected, is rout if the peripheral is selected for a normal bus cycle (TGA_I ==1), and is equal to 8 if it is a poll bus cycle and there is data ready for the host. The“8” in the above code tell the bus controller how many bytes to send to the host. It might not be obvious but the peripheral returns zero on a poll when it does not have data ready for the host. In an auto send the bus controller reads consecutive registers starting at register zero. For the above example this would mean reading registers 0 through 7 in the auto send response.

Debug Your Peripheral with Iverilog

Having done it both ways, this author can attest to the fact that it is much easier to debug a new peripheral using a simulator. This section gives sample code and a few tips for debugging you peripheral using iverilog.

You may recall from the counter example above that you can think of a test bench as a circuit board onto which you plug your new peripheral. Inputs to your peripheral are outputs from the test bench. As with all Verilog, it is best to start with an explanation of how the circuit works

  /////////////////////////////////////////////////////////////////////////
  // sr04_tb.v : Testbench for the sr04 peripheral with parallel trigger
  //
  //  Registers are
  //    Addr=0/1    Echo time of sensor 1 in microseconds
  //    Addr=2/3    Echo time of sensor 2 in microseconds
  //    Addr=4/5    Echo time of sensor 3 in microseconds
  //    Addr=6/7    Echo time of sensor 4 in microseconds
  //    Addr=8/9    Echo time of sensor 5 in microseconds
  //    Addr=10/11  Echo time of sensor 6 in microseconds
  //    Addr=12/13  Echo time of sensor 7 in microseconds
  //    Addr=14     Trigger interval in units of 10 ms, 0==off
  //
  //  The test procedure is as follows:
  //  - Set the trigger interval to 40ms
  //  - Raise all inputs after 500us
  //  - Lower inputs after 10,11,12,13,14,15, and 16 ms
  //  - Verify that data ready flag goes high
  //  - Read all 14 echo time registers and verify times 
  `timescale 1ns/1ns

The test bench is self contained so it does not have input/output lines.

  module sr04_tb;

As mentioned above, the inputs to your circuit are registered outputs from the test bench and outputs from your circuit are wires to the test bench.

  reg    CLK_I;            // system clock
  reg    WE_I;             // direction of this transfer. Read=0; Write=1
  reg    TGA_I;            // ==1 if reg access, ==0 if poll
  reg    STB_I;            // ==1 if this peri is being addressed
  reg    [7:0] ADR_I;      // address of target register
  wire   STALL_O;          // ==1 if we need more clk cycles to complete
  wire   ACK_O;            // ==1 if we claim the above address
  reg    [7:0] DAT_I;      // Data INto the peripheral;
  wire   [7:0] DAT_O;      // Data OUT from the peripheral, = DAT_I if not us.
  reg    [7:0] clocks;     // Array of clock pulses from 100ns to 1 second
  wire   [7:0] pins;       // Pins to HC04 modules.  Strobe is LSB
  reg    [6:0] echo;       // echo inputs from the SR04 sensors
  
  // Add the device under test
  sr04 sr04_dut(CLK_I,WE_I,TGA_I,STB_I,ADR_I,STALL_O,ACK_O,DAT_I,DAT_O,clocks,pins);

The initialization for the test bench is similar to what you had for the counter.

  initial echo = 0;
  assign pins[7:1] = echo[6:0];

  // generate the clock(s)
  initial  CLK_I = 1;
  always   #25 CLK_I = ~CLK_I;
  initial  clocks = 8'h00;
  always   begin #50 clocks[`N100CLK] = 1;  #50 clocks[`N100CLK] = 0; end
  always   begin #950 clocks[`U1CLK] = 1;  #50 clocks[`U1CLK] = 0; end
  always   begin #9950 clocks[`U10CLK] = 1;  #50 clocks[`U10CLK] = 0; end
  always   begin #99950 clocks[`U100CLK] = 1;  #50 clocks[`U100CLK] = 0; end
  always   begin #999950 clocks[`M1CLK] = 1;  #50 clocks[`M1CLK] = 0; end
  always   begin #9999950 clocks[`M10CLK] = 1;  #50 clocks[`M10CLK] = 0; end
  always   begin #99999950 clocks[`M100CLK] = 1;  #50 clocks[`M100CLK] = 0; end
  always   begin #999999950 clocks[`S1CLK] = 1;  #50 clocks[`S1CLK] = 0; end
  
  // Test the device
  initial
  begin
      $display($time);
      $dumpfile ("sr04_tb.xt2");
      $dumpvars (0, sr04_tb);

Usually you will want to start with no activity on the bus.

      //  - Set bus lines and FPGA pins to idle state
      #50; WE_I = 0; TGA_I = 0; STB_I = 0; ADR_I = 0; DAT_I = 0;

Some time later you can start writing to the configuration registers in your design. You are addressing your registers as long as STB_I and TGA_I are high so be sure to set them low after writing to your configuration registers.

      #1000    // some time later
      //  - Set the sr04 trigger interval to 40ms
      #50; WE_I = 1; TGA_I = 1; STB_I = 1; ADR_I = 14; DAT_I = 4;
      #50; WE_I = 0; TGA_I = 0; STB_I = 0; ADR_I = 0; DAT_I = 0;

When debugging your circuit you might want to see not just what the test bench is doing but see what your circuit is doing. For example, to see that the value when writing to the configuration register you could add a display statement to the Verilog for your peripherals. For sr04 this might appear as:

      // Latch new trigger interval on write to reg 14.
      if (TGA_I & myaddr & WE_I & (ADR_I[3:0] == 14))
      begin
          rate <= DAT_I[3:0];  // get poll interval 
          state <= `ST_WAIT;   // wait for next poll
          $display("New trigger rate is", DAT_I[3:0]);
      end

The $display statement in the above code is ignored when the code is compiled for an FPGA.

If you are dealing inputs to the FPGA your test bench will have to drive those inputs. In the case of the sr04 the inputs are set at particular intervals.

      //  - Wait 10.1 ms for start of sampling
      #10100000
  
      //  - Trigger is done, now raise echo inputs
      echo[6:0] = 7'h7f;            // all inputs high waiting for ping response
      //  - Lower inputs after 10,11,12,13,14,15, and 16 ms
      #10000000 echo[0] = 1'b0;
      #1000000  echo[1] = 1'b0;
      #1000000  echo[2] = 1'b0;
      #1000000  echo[3] = 1'b0;
      #1000000  echo[4] = 1'b0;
      #1000000  echo[5] = 1'b0;
      #1000000  echo[6] = 1'b0;
      $display("inputs done at t=", $time);

End your test bench as you did for the counter.

      $finish;
      end
  endmodule

As before, run iverilog and view the waveforms with gtkwave.

  iverilog -o sr04_tb.vvp ../sysdefs.h sr04_tb.v ../sr04.v
  vvp sr04_tb.vvp -lxt2
  gtkwave sr04_tb.xt2

The code in this section has been take in part from the sr04 test bench. Hopefully you will not have too much difficulty modifying it for your peripheral.

How the DPCore Build System Works

How to Add a New Peripheral Driver Module

The next step after adding and testing your Verilog peripheral is to write a driver for it. This section describes the common features of the drivers and offers some tips that might simplify your driver.

We use the term “driver” but do not confuse these with real Linux kernel drivers. Driver is the right concept but technically our drivers are loadable plug-in modules implemented as shared-object files. Our existing drivers all use C but you can use any language that can produce a shared-object file. C, C++, and Rust are all good choices.

The code structure of drivers is fairly consistent from one driver to the next. This make the documentation describing it all the more important. Your file header block should start with copyright and license information. Since the driver connects the dpset/dpget API to the registers you should include a description of the API as if you were describing to someone who had never seen it before. This is where you answer the reader's question of “what does it do?” Next describe the registers and the meaning, if appropriate, of all of the bits in the registers. The final piece is a description of how the API values relate to the register values. The API-to-register documentation will make your driver much easier to maintain when you come back to it later.

DPCore drivers are event driven and have to deal with three events: creation, an API command from the user, and arrival of a packet from the FPGA. These three events are handled by Initialize(), which is executed when the module is attached to the daemon, usercmd() which is a callback invoked for the API commands dpset, dpget, and dpcat, and packet_hdlr() which is a callback that is executed when a packet arrives from the FPGA.

Initialize()

To understand how to load a driver into dpdaemon you should, perhaps, have some understanding of how dpdaemon works.

The core of dpdaemon is a list of slots. Each slot has a SLOT structure (includes/daemon.h) which has the information needed to manage the peripheral in that slot. SLOT has the number of the slot, the name of the shared object file, and an array of resources (RSC in includes/daemon.h) for the peripheral. Resources, you may recall, is the generic term given to the attributes and data endpoints of the peripheral.

Peripheral #0 in the FPGA binary is the enumerator. This is just a copy of the perilist configuration file used to build the FPGA binary. When dpdaemon starts it loads the enumerator driver and reads the list of peripherals in the FPGA. It then loops through the list trying to load the shared object driver for each peripheral. When the driver is loaded dpdaemon looks up and calls the Initialize() routine in the driver. (Look for dlsym() in daemon/ui.c to see how this works.) The goal of Initialize() is to give dpdaemon (i.e. the SLOT structure) everything it needs to manage the peripheral.

Dpdaemon can have multiple instances of the same peripheral. This implies that an instance's internal state must be kept separate from the internal state of all other instances. To do this you should create a structure or object that holds your peripheral internal state. For example, the gpio4 peripheral keeps the following state information:

      // All state info for an instance of an gpio4
  typedef struct
  {
      void    *pslot;    // handle to peripheral's slot info
      int      pinval;   // value of the (output) pins
      int      dir;      // pin direction (in=0, out=1)
      int      intr;     // autosend on change (no=0, yes=1)
      void    *ptimer;   // timer to watch for dropped ACK packets
  } GPIO4DEV;

The Initialize() routine is passed a pointer to its allocated SLOT structure (SLOT *pslot). Allocate memory for your peripheral state information and attach it to the SLOT structure with:

  MYPERIDEV *pctx;    // our local device context
  
  // Allocate memory for this peripheral
  pctx = (MYPERIDEV *) malloc(sizeof(MYPERIDEV));
  if (pctx == (MYPERIDEV *) 0) {
      // Malloc failure this early?
      dplog("memory allocation failure in myperi initialization");
      return (-1);
  }
  pslot->priv = pctx;

While not a hard requirement, generally the above is the only time your driver should allocate memory.

  // Register this slot's packet callback (pcb).
  // Set its name, description and help text.
  (pslot->pcore)->pcb  = packet_hdlr;
  pslot->name = "myperi";
  pslot->desc = "Quad General Purpose Great Peripheral";
  pslot->help = README;

The help text is stored in the readme.txt file and is converted to readme.h as part of the build process. Be sure to give your readme.txt file a high level description of the peripheral and a detailed description of all of your peripherals resources. You can help your users a lot by including examples that can be cut-and-pasted in a shell and will always work.

The Initialize() routine is where you set the name and properties of your resources. The pointer to the get/set callback (pgscb) can be unique to each resource or can point to one routine that handles all user API calls. Over time we have found that having one API callback is easier to understand and maintain, especially for simple peripherals. Your resource definitions might appear something like this:

  // Add the handlers for the user visible resources
  pslot->rsc[RSC_PINS].name = FN_PINS;
  pslot->rsc[RSC_PINS].flags = IS_READABLE | IS_WRITABLE | CAN_BROADCAST;
  pslot->rsc[RSC_PINS].bkey = 0;
  pslot->rsc[RSC_PINS].pgscb = usercmd;
  pslot->rsc[RSC_PINS].uilock = -1;
  pslot->rsc[RSC_PINS].slot = pslot;
  pslot->rsc[RSC_DIR].name = FN_DIR;
  pslot->rsc[RSC_DIR].flags = IS_READABLE | IS_WRITABLE;
  pslot->rsc[RSC_DIR].bkey = 0;
  pslot->rsc[RSC_DIR].pgscb = usercmd;
  pslot->rsc[RSC_DIR].uilock = -1;
  pslot->rsc[RSC_DIR].slot = pslot;
  pslot->rsc[RSC_INTR].name = FN_INTR;
  pslot->rsc[RSC_INTR].flags = IS_READABLE | IS_WRITABLE;
  pslot->rsc[RSC_INTR].bkey = 0;
  pslot->rsc[RSC_INTR].pgscb = usercmd;
  pslot->rsc[RSC_INTR].uilock = -1;
  pslot->rsc[RSC_INTR].slot = pslot;

usercmd()

The usercmd() routine is where you convert your API calls to read and write resources into packets of register reads and writes.

The interface to dpdaemon is a TCP socket. The daemon listens on the socket and accepts connections from application programs. The application program sends lines of text in the form

  [dpset|dpget|dpcat] [peri_name|slot_id] resource_name [resource_values]

The daemon parses line of input and rejects lines that do not match the above format. The daemon checks for a valid peripheral name or slot ID, checks for a valid resource name, and verifies that the command (get/set/cat) is appropriate for the resource. If everything is valid, the daemon calls your get/set callback.

The daemon passs a lot of information into your callback, including the command (DPGET, DPSET, or DPCAT), the resource index you set in Initialize(), and the string of the new value, There can be many instances of your peripheral, so the callback includes a SLOT pointer from which you can the the instance's private data structre. Your response the the application that issued the command should be a newline terminated line of ASCII text. The text goes into the 'buf' parameter and before returning you set *plen to the number of characters you put in buf. You should be able to use the following exactly as it for your usercmd() callback.

  static void usercmd(
      int      cmd,        //==DPGET if a read, ==DPSET on write
      int      rscid,      // ID of resource being accessed
      char    *val,        // new value for the resource
      SLOT    *pslot,      // pointer to slot info.
      int      cn,         // Index into UI table for requesting conn
      int     *plen,       // size of buf on input, #char in buf on output
      char    *buf)
  {

Usually the first thing to do is get the “local context” for this instance of your peripheral. Do this with:

  pctx = (MYPERIDEV *) pslot->priv;

Your code now needs to switch based on the resource and command. A switch() statement works as does a string of if()/else if() statement. Use your preferred coding style. Long or complex calculations based on the user input might be moved to a separate routine to keep usercmd() simple and readable. Your code might look something like the following:

  if ((cmd == DPGET) && (rcsid == RSC_MYRSC1)) {
      ret = snprintf(buf, *plen, "%1x\n", pctx->intr);
      *plen = ret;  // (errors are handled in calling routine)
      return;
  }
  else if ((cmd == DPSET) && (rcsid == RSC_MYRSC1)) {
      ret = sscanf(val, "%x", &newrsc1);
      if ((ret != 1) || (newrsc1 < 0) || (newrsc1 > 0xf)) {
          ret = snprintf(buf, *plen,  E_BDVAL, pslot->rsc[rscid].name);
          *plen = ret;
          return;
      }
      pctx->rsc1 = newrsc1;
      sendconfigtofpga(pctx, plen, buf);  // send rsc1 and rsc2 to FPGA
  }
  else if ((cmd == DPSET) && (rcsid == RSC_MYRSC2)) {
      // Do a long or complex calculation in another routine
      newrsc2 = getrsc2(val);
  }
  else if ((cmd == DPCAT) && (rcsid == RSC_MYRSC3)) {
  .....
  }

The above code shows how to respond to resource values that are out of range or otherwise invalid. This code hides sending the packets to the FPGA.

Sending Packets to the FPGA

The daemon and DPCore communicate using a packet based protocol which is defined in include/fpga.h. You build a packet by setting the command, specifying the slot number, the register address, and the number of bytes in the data part of the packet. Your code to build a packet might appear as follows:

  static void sendconfigtofpga(
      MYPERIDEV *pctx,   // This peripheral's context
      int     *plen,     // size of buf on input, #char in buf on output
      char    *buf)      // where to store user visible error messages
  {
      DP_PKT   pkt;      // send write and read cmds to the gpio4
      SLOT    *pslot;    // This peripheral's slot info
      CORE    *pmycore;  // FPGA peripheral info
      int      txret;    // ==0 if the packet went out OK
      int      ret;      // generic return value
  
      pslot = pctx->pslot;
      pmycore = pslot->pcore;
  
      // Write the values for the pins, direction, and interrupt mask
      // down to the card.
      pkt.cmd = DP_CMD_OP_WRITE | DP_CMD_AUTOINC;
      pkt.core = pmycore->core_id;
      pkt.reg = MYPERI_REG_RSC1;   // the first reg of the three
      pkt.data[0] = pctx->rsc1;
      pkt.data[1] = pctx->rsc2;
      pkt.data[2] = pctx->rsc3;
      pkt.count = 3;
      txret = dpi_tx_pkt(pmycore, &pkt, 4 + pkt.count); // 4 header + data

Some peripherals use a FIFO as a data endpoint. In this case you would want to write all the bytes to one register. Other peripherals have a string of registers that should be written secuentially. This is referred to as “autoincrement” or “no autoincrement”. Autoincrement can apply to both reading and writing registers so the four possibilities for the command are:

  pkt.cmd = DP_CMD_OP_WRITE | DP_CMD_AUTOINC;
  pkt.cmd = DP_CMD_OP_WRITE | DP_CMD_NOAUTOINC;
  pkt.cmd = DP_CMD_OP_READ  | DP_CMD_AUTOINC;
  pkt.cmd = DP_CMD_OP_READ  | DP_CMD_NOAUTOINC;

The routine to send a packet to the FPGA is dpi_tx_pkt(). You give it the peripheral address, the packet to send, and the total number of byte in the packet. Dpi_tx_pkd() returns a success or failure indication. You can use this to warn the user or to schedule another attempt. Generally, something is seriously wrong if dpi_tx_pkt returns an error.

Handling Packets from the FPGA

When you initialized your peripheral instance you specified a packet receive callback. Your callback should be able to handle three types of packets from the FPGA. The first is an acknowledgement for a packet you sent. Use this packet to stop the timeout timers if you haveset one. Otherwise the acknowledgement can be ignored.

The second kind of packet is a read response. Validate the packet and then read and format the packet data to send to the application. Data to the application must be formatted as an ASCII string terminated by a newline. When an application gives a DP_GET command the daemon marks the TCP connection as waiting for data from your peripheral. You send data back to the application using a call to send_ui().

The third kind of packet is an autosend packet. Recall that the FPGA does not have a interrupt line to the CPU and instead can automatically send packets up to the host. The autosend packet is similar in structure to a read response packet. The difference is the high bit of the cmd byte. In a read response the bit is set and in an autosend packet the bit is cleared. Autosend data is most often used with resources that support the DP_CAT command. The publish subscribe system in dpdaemon allows multiple TCP connections to subscribe to the same resource. The routine to publish autosend data is the bcst_ui() routine. Your code for read responses and autosend data might look like:

      // If a read response from a user dpget command, send value to UI
      if ((pkt->cmd & DP_CMD_AUTO_MASK) != DP_CMD_AUTO_DATA) {
          pinlen = sprintf(pinstr, "%1x\n", (pkt->data[0] & 0x0f));
          send_ui(pinstr, pinlen, prsc->uilock);
          prompt(prsc->uilock);
  
          // Response sent so clear the lock
          prsc->uilock = -1;
          del_timer(pctx->ptimer);  //Got the response
          pctx->ptimer = 0;
          return;
      }
  
      // Process of elimination makes this an autosend packet.
      // Broadcast it if any UI are monitoring it.
      if (prsc->bkey != 0) {
          pinlen = sprintf(pinstr, "%1x\n", (pkt->data[0] & 0x0f));
          // bkey will return cleared if UIs are no longer monitoring us
          bcst_ui(pinstr, pinlen, &(prsc->bkey));
          return;
      }

You can see some of the internal working of the daemon in the above code. The uilock tied to a resourse tells your driver that it is in a state of waiting for a read response from the FPGA. The resource 'broadcast key', bkey, tells if any applications have subscribed to the stream of data offered by the resource.

Non-FPGA Based Peripherals

If you have ever built an application using dpdaemon then you have most likely come to appreciate the clean, simple, publish-subscribe API that it offers. This section describes how you can use the dpdaemon and its API for non-FPGA based peripherals. Let's start with an example of how it works.

Dpdaemon comes with several examples of non-FPGA peripherals. The first one to test is the 'hello_world' demo. Start dpdaemon with any DPCore binary you have available. Then at a command prompt enter:

  dploadso hellodemo.so
  dplist
  dplist hellodemo

You should see the new peripheral listed in slot #11. The help text displays the reqsources available to the peripheral. Test it with the commands:

  dpget hellodemo messagetext
  dpset hellodemo messagetext "Hello, again!"
  dpset hellodemo period 5
  dpcat hellodemo message

The structure of non-FPGA based drivers is almost identical to FPGA based ones. You will still need the Initialize() and usercmd() routines. One difference is that non-FPGA based peripherals do not need a packet handler. However they may need the ability to respond to data arriving from a file descriptor. Working code for this is in the gamepad driver. If you have as device or socket that you want to use as a data source you can add a callback for your file descriptor with a call to add_fd(). An example taken from the gamepad driver Initialize routine is shown below:

  // Init our GAMEPAD structure
  pctx->pslot = pslot;       // this instance of the hello demo
  pctx->period = 0;          // default state update on event
  pctx->filter = 0;          // default is to report all controls
  pctx->indx = 0;            // no bytes in gamepad event structure yet
  (void) strncpy(pctx->device, DEFDEV, PATH_MAX);
  // now open and register the gamepad device
  pctx->gpfd = open(pctx->device, (O_RDONLY | O_NONBLOCK));
  if (pctx->gpfd != -1) {
      add_fd(pctx->gpfd, DP_READ, getevents, (void *) pctx);
  }

In the above case the callback getevents() is called when the file descriptor is readable. Callbacks are given the file descriptor that generated the callback as well as the transparent data pointer passed in when add_fd() is called. In the above example the tranparent data is a pointer to the GAMEPAD pctx structure. The getevents() routine shows the callback structure.

  static void getevents(
      int       fd_in,         // FD with data to read,
      void     *cb_data)       // callback date (==*GAMEPAD)
  {

You can think of dpdaemon as having two parts, the daemon part and the FPGA part. The FPGA part is actually started as if it were a non-FPGA driver. As mentioned above, dpdaemon load the driver for the “enumerator” peripheral and then the enumerator driver loads drivers for the peripherals found in the list from the FPGA. You can easily make dpdaemon entirely non-FPGA based by a small change in main() of daemon/main.c.

  // Add drivers here to always have them when the program starts
  // The first loaded is in slot 0, the next in slot 1, ...
  (void) add_so("enumerator.so");   // slot 0
  //(void) add_so("tts.so");      // first available slot after FPGA slots

To better understand this you might want to comment our the enumerator and add tts and gamepad to main.c and see how the resulting system is all non-FPGA peripherals.

Demand Peripherals

Table of Contents

Baseboard4 Developer's Guide

Introduction

How to Get Started with Verilog

"Hello World" in Verilog

Test Your Verilog Design Using Iverilog

Install and Test the Xilinx Toolchain

Download Your Design to the Baseboard

How to Write a Wishbone Peripheral

The DPCore Wishbone Bus

Clone an Existing Peripheral

Design Tips for a DPCore Peripheral

Debug Your Peripheral with Iverilog

How the DPCore Build System Works

How to Add a New Peripheral Driver Module

Initialize()

usercmd()

Sending Packets to the FPGA

Handling Packets from the FPGA

Non-FPGA Based Peripherals

Demand Peripherals

User Tools

Site Tools

Table of Contents

Baseboard4 Developer's Guide

Introduction

How to Get Started with Verilog

"Hello World" in Verilog

Test Your Verilog Design Using Iverilog

Install and Test the Xilinx Toolchain

Download Your Design to the Baseboard

How to Write a Wishbone Peripheral

The DPCore Wishbone Bus

Clone an Existing Peripheral

Design Tips for a DPCore Peripheral

Debug Your Peripheral with Iverilog

How the DPCore Build System Works

How to Add a New Peripheral Driver Module

Initialize()

usercmd()

Sending Packets to the FPGA

Handling Packets from the FPGA

Non-FPGA Based Peripherals

Page Tools