Skip to content

Integrating the Coprocessor

Usable

This lab description is complete. There could still be minor updates, which will be highlighted.

Try out Coprocessor Packaging, Integration using AXI Stream FIFO, and Integration using AXI DMA using the original template coprocessor provided in Lab 1 first, before using your version of the Lab 1 coprocessor.

Once you have gotten it to work, please attempt the assignment below.

Assignment 3

The assignment essentially involves combining Lab 1 and Lab 2, such that the data is streamed from the serial terminal (e.g., RealTerm) to the C code running on ARM Cortex A53, matrix multiplication is performed, and the results are sent back to the serial terminal.

The matrix multiplication should be done in 3 different ways as given below, and the time taken (measured using AXI Timer) should be reported in each case.

1) Purely in software 2) In hardware using the coprocessor interfaced via AXI Stream FIFO 3) In hardware using the coprocessor interfaced via AXI DMA

For the above, can either have two separate projects/platforms, or a single project/platform with 2 copies of the coprocessor - one connected via FIFO and the other via DMA.

All the required files are here.

Please follow the following procedure.

  • Change your Lab 1 HDL code to accommodate the bigger matrix sizes for A, B, and RES (64x8, 8x1, and 64x1 respectively).
  • Test it thoroughly (behavioral simulation as well as post-synthesis functional simulation) using a well-designed testbench. You will have to modify the .mem files and the testbench to deal with the bigger matrix. Some other cases were not tested by the Lab 1 testbench, such as the non-continuous assertion of S_AXIS_TVALID and M_AXIS_TREADY. Hopefully, these should be fine in your design. AXI FIFO is unlikely to do a non-continuous assertion of these signals, whereas some other peripherals such as AXI DMA are likely to.
  • Now, integrate this coprocessor using the same procedure you had followed for the original Lab 1 template coprocessor.  You might want to have a look at the Modifying the Coprocessor page to see how to re-package your modified coprocessor. You will have to modify test_fifo_myip_v1_0.c file as appropriate. Make sure that the Transmit FIFO Depth of AXI Stream FIFO is set correctly (in Vivado), or send the data in multiple tranches from your main program.
  • Once everything works fine using FIFO, try using DMA.
  • Initially, hard code matrices A and B as initialised arrays to avoid the hassle of having to send data over and over. Later, you should modify it to deal with the data streamed from RealTerm, just as you did for Lab 2.

Submission Info

Assignment 3 (6 marks)

Demonstrate in week 7.

Upload

Upload an archive containing only the files you have created/modified such as

  • .xsa file (s)
  • .c/.h file (s)
  • .v/.vhd scaled up RTL and testbench
  • .xdc file (only if you manually inserted/modified any constraints)
  • input/output test files - the .mem files as well as the files sent/received from/by RealTerm (only if you modified them)

exactly as used for the demo to Canvas within 1 hour of your demo.

It should be as a .zip archive, with the filename GroupNum_Lab3.zip.

Please DO NOT upload the whole project - include only the files mentioned above!

Tips

  • Debugging in Lab 3 is hard. It is your hardware, interacting with your software. It is hard to identify and isolate issues.
  • You can check the software functionality by setting breakpoints just before sending data to the coprocessor / just after receiving the data from the coprocessor.
  • If the software appears to be ok, then it is likely your hardware. This can be
    • A missing connection in the block diagram -  'Validate Design' running ok does not guarantee everything is ok. It only checks for some very essential stuff, which might not be good enough for proper functionality.
    • The IP not being updated. Changing the HDL code alone is not sufficient. You need to repackage the IP (see the last part of Packaging the Coprocessor as an IP page to know how to do this). Then try regenerating sources. Worst come to the worst, package the IP afresh.
    • A functional issue with your co-processor. This typically involves not asserting M_AXIS_TLAST and M_AXIS_TVALID correctly, and not dealing with M_AXIS_TREADY properly. Some possible reasons:
    • If you don't get a response from the co-processor, check if M_AXIS_TLAST is asserted properly.
    • If it doesn't respond with the correct amount of data, it could be M_AXIS_TLAST getting asserted at the wrong time.
    • If it responds with incorrect data, check the correspondence between M_AXIS_TVALID and M_AXIS_TDATA.
    • Verify these using post-synthesis functional simulation, not just using behavioral simulation.
  • You will possibly get a critical warning: "[Timing 38-282] The design failed to meet the timing requirements. Please see the timing summary report for details on the timing violations.".
  • This is not a warning that you should normally ignore. The timing analysis tool is complaining that the design may not work at 100 MHz, i.e. has a critical path greater than 10 ns (possibly in multiplication). You can look at the timing summary and figure out which path causes it to fail the timing. You can then see how this can be fixed, which will involve some design modifications.
  • Use the Vitis built-in debugger (TCF-based) to step through your software code, and inspect what is happening at each step, to ensure that the data sent to the co-processor is correct, as well as to inspect the value received from the coprocessor.