Discussion of solutions for SDI to PCIe that enables up to 4 bi-directional channels of 1080p Video. Including an examination of applications, challenges and benefits associated with implementing PCIe-based systems, and a discussion of a video framework that simplifies hardware design for video systems with a PCIe-based design.
AWS Community Day CPH - Three problems of Terraform
Upgrade Your Broadcast System to PCIe Gen2
1. Upgrade Your Broadcast System to PCIe Gen2 Brian Jentz, Altera Corporation Neil Childs, OmniTek
2.
3.
4.
5.
6.
7. Devices for Broadcast 8 – 36 up to 6.5 Gbps 2.8M – 11.5M usable ASIC gates 6.3 – 20.3 2.97, 2.5, 5, and 1.25 HardCopy ® IV GX ASIC 12 – 25 up to 11.3 Gbps; 12-24 up to 6.5 Gbps 230K – 530K 13.3 – 20.3 2.97, 2.5, 5, and 1.25 Stratix IV GT FPGA 0-16 up to 6.5 Gbps; 8.32 up to 8.5 Gbps 70K – 530K 6.3 – 20.3 2.97, 2.5, 5, and 1.25 Stratix ® IV GX FPGA 4 – 16 up to 3.75 Gbps 16K – 256K 0.7 – 8.5 2.97, 2.5, 1.25 Arria ® II GX FPGA Transceiver Channels Logic Elements Embedded Memory (Mbit) Data Rates for 3G-SDI, PCI Express Gen1 and Gen2, and Gigabit Ethernet (Gbps per Lane) Device
8.
9. PCIe Hard IP Testing Altera PCIe Development Cards PCIe Gen2 Motherboard Stratix IV GX PCIe Gen1 and Gen2 Arria II GX PCIe Gen1 PCle Gen2 platforms Motherboard Chipset Processor AMD Asus M3A78-T 790GX/SB750 AMD790GX Intel Asus P5Q-EN Intel-645 LGA775 Intel DX58S0 Tylersburg Core I7
10. Video Framework with PCIe DVI/HDMI SDI Format Conversion BT656 Deinterlacer (VIP) Deinterlacer (VIP) Scaler (VIP) Scaler (VIP) Input CODEC SDI BT656 Output DVI/HDMI Audio Sample rate converter CODEC Audio Video Delay frame sync Ethernet PCIE TS Mux Video Over IP PCIE Deinterlacer (VIP) Scaler (VIP) Altera ® IP Custom IP DDR2/3 Mem ctl DMA Ctl DMA Ctl Display port Deinterlacer (VIP) Scaler (VIP) Display port Third-Party IP PCIe PCIe MPEG2 H.264 JPEG2000 Custom video functions Altera ® IP
11. Video Framework: SOPC/Avalon-ST Video SOPC Builder Ready Avalon ® -ST Video SOPC-Ready Function SDI/HDMI/DP Clocked Video Input AV-ST-V vip1 AV-ST-V vip2 AV-ST-V Clocked Video Output SDI/HDMI/DP
12. Example: Format Conversion SD/HD/3G-SDI CLIP Motion adaptive Polyphase scaler 6x6 taps, 4:4:4 mode Deinterlacer 4:2:2 Frame rate conversion SCL Frame buffer DDR2 HP memory Avalon-ST video Clocked video Nios II processor Run-time configuration through Nios ® II processor CRS 4:2:2 to 4:4:4 CRS 4:4:4 to 4:2:2 Interlacer 4:2:2 Res.: 480i to 1080p SDI SD/HD/3G-SDI SD/HD/3G-SDI CLIP Motion adaptive Polyphase scaler 6x6 taps, 4:4:4 mode Deinterlacer 4:2:2 Frame rate conversion SCL Frame buffer CRS 4:2:2 to 4:4:4 CRS 4:4:4 to 4:2:2 Interlacer 4:2:2 SDI SD/HD/3G-SDI CVI CVO CVI CVO
13. Four-Channel 1080p Video Streaming with Stratix IV GX FPGA SDI core Standard detect Clocked in Multichannel Video Streaming DMA Controller Stratix IV GX FPGA PCIe Hard IP SOPC tool flow SDI video Streaming PCIe video Ext. memory interface Clocked out 352 insert MUX Test pattern
14.
15.
16.
17.
18. FPGA Block Diagram Stratix IV FPGA SOPC Builder component Control registers Video output Video input SDI MegaCore function Multiplex 352 insert 352 insert Test pattern (color bar) Standard detect Stratix IV FPGA PCIe hard IP OmniTek translation OmniTek multi-channel streaming DMA controller MDMA FDMA FDMA Test mem Clocked video input Clocked video output FPGA capability Timing LUT Video I/O capability
19. SOPC Builder DMA Component DMA controller wrapper Altera PCIe hard IP DMA Controller Core DMA capability registers PCIe translation block - Avalon-MM BAR BAR 0 DMA scatter gather DMA master controller FDMA channel FDMA channel MDMA channel Further FDMA channels Avalon-ST video interface Avalon-ST video interface Avalon-ST video interface Avalon-MM Avalon-ST video
20.
21. PC GUI Application View four incoming video streams Detect video standard Generate four video outputs Select output standards
Hello and welcome to Upgrade Your Broadcast System to PCIe Gen 2 brought to you by Altera Corporation and Omnitek. My name is Brian Jentz and today I will be discussing a solution for SDI to PCIe that enables up to 4 bi-directional channels of 1080p Video. This is part 1 of the Video Framework Online Series. Later this year, we will present part 2: Remove the external bottleneck in your video design. Also, available now is an 8-minute on-line demo which shows how format conversion can be accelerated using the 1080p video framework.
First, we will look at the applications, challenges and benefits associated with implementing PCIe-based systems. Next, we will discuss a video framework that simplifies hardware design for video systems. The solution we will discuss today leverages this framework to simplify PCIe-based design. Then, Neil Childs from Omnitek will discuss a very efficient SDI to PCIe solution that addresses those key challenges. Finally, we will look at next steps you can take to find out more and evaluate this solution.
The IT industry has been playing an increasingly important role in broadcast. This includes leveraging PC technology and standards such as Ethernet and PCI Express. Many applications have moved away from proprietary motherboards and embedded processors to PC-based motherboards with x86 processors and bus structure. This simplifies software and hardware development. There has also been an explosion of video content providers – from university sports to houses of worship. This has driven significant growth in PC-based capture and editing. The combination of these two areas has driven a dramatic increase in the development and use of PCIe-based cards in broadcast and pro A/V. I have listed a number of the common applications on this slide.
With ESPN launching the first 1080p capable studio in Los Angeles in April, the broadcast industry looks poised to start the transition to 1080p. With 1080p, comes the requirement for higher bandwidth. A typical broadcast system with 4 SDI I/O will require 10 Gbps of bandwidth across the PCIe bus. PCIe has 8b10b which reduces raw data throughput by 20%. So, PCIe Gen2 with 5 Gbps per lane, goes down to 4 Gbps taking into account the impact 0f 8b10b. There is additional overhead which effectively reduces the practical bandwidth to 3.375 Gbps per lane. So, a PCIe Gen2x4 offers a practical bandwidth of 13.5 Gbps. In order to support 4 1080p channels across 4 lanes, PCIe Gen 2 is a must. Even there, the system needs to employ an DMA controller with approximately 80% efficiency. So 1080p comes with a significant bandwidth challenge for PCIe based systems.
In addition to bandwidth, there are a number of other challenges in a PCIe-based broadcast system. The SDI I/O implementation needs to support all data rates and frame rates. From 270 Mbps to 3 Gbps and 24 Hz to 60 Hz. PCIe introduces latency in the system, which needs to be mitigated PCIe and SDI operate on different clocks which requires an alignment of clock domains There are many different PC motherboard chipsets available, which forces hardware testing across many platforms With any broadcast equipment, the first company to market with the right feature set has a significant advantage and usually ends up with market share leadership.
First, we will discuss a video framework that simplifies hardware design for video systems. The solution we will discuss today leverages this framework to simplify PCIe-based design. Next, we will look at the applications, challenges and benefits associated with implementing PCIe-based systems. Then, Roger Fawcett from Omnitek will discuss a very efficient SDI to PCIe solution that addresses those key challenges. Finally, we will look at next steps you can take to find out more and evaluate this solution.
There are several different choices of FPGAs for broadcast applications. As an example, Altera’s 40-nm based Stratix IV GX devices offer transceiver speeds up to 32 6.5 Gbps transceivers, high on-chip memory, video processing performance of 300+ MHz, and 533 MHz external memory performance Altera’s 40-nm Arria II GX family delivers cost and power-optimized silicon for broadcast, featuring up to 16 3.75 Gbps transceivers and 300 MHz external memory performance. Stratix IV GT delivers 10+ Gbps transceiver which enables 10G Ethernet and 10G SDI for aggregated SDI applications. Hardcopy ASIC enables the lowest cost and power for applications such as portable cameras.
The fact that PCIe Hard IP is pre-verified and 100% timing closed saves design teams immensely on what is a complex function (up to 15K LE) Also: Shortens compile time Fits into a smaller FPGA Saves power relative to soft IP implementation In Stratix IV, the PLL in the transceiver block converts the 100 MHz motherboard reference clock to 250 MHz for Gen2 operation. No external PLL is required. Integrated Transaction layer (TL), Data Link layer (DLL), physical interface/media access control (PHY/MAC), and transceivers Low-risk, hardware-verified solutions PCI-SIG compliance workshops Interoperability with multiple ASSP vendors 5 generations of transceiver-based FPGAs with PCI Express support Development kits/demo boards
One of the challenges we discussed earlier was interoperability with a number of different platforms in the industry. Altera has passed PCI-SIG tests and compatibility tests with existing hardware platforms. Interoperability testing Compatibility/functionality with chipsets Core generator with verifying payloads (built in DMA engine) Test for PCIe compliance using PCI-SIG tests Performance testing Testing throughput of PCIe link Stress test using the chain DMA architecture Test configurations Modes PCI Express Gen1 x1, x4, x8 PCI Express Gen2 x1, x4, x8 Now that we have discussed the design challenges for broadcast systems with PCIe and the critical capabilities offered by Altera, I will turn it over to Omnitek to discuss the details of the reference design that they provide.
This diagram shows a typical broadcast system and solutions that exist from Altera and its 3 rd party network. Of course, SDI is the primary I/O standard for broadcast; but other standards co-exist as well. For example, many multiviewers have DVI or HDMI outputs today and may add Display Port in the future. More and more, studios want all their equipment to support any format; Thus, format conversion is moving from an optional feature to a standard requirement on all I/O. Due to requirements for video frame stores, relatively wide external DDR memory can be found in almost all video equipment and the performance of the memory controller and scheduler is a critical consideration in broadcast design. Altera is introducing a new controller and scheduler at IBC 2009 that is optimized for broadcast video applications. More details will be covered outside this webcast. Video Codecs are implemented in FPGAs in video server, contribution encoders, distribution encoders, and IRDs. Altera has a network of companies offering solutions in that space. A number of solutions for Audio exist, include audio de-embed, sample rate converter, and codecs such as AC3. Altera has been a leader in enabling video over IP in studio and headend applications, first introducing its first video over IP reference design in 2006. This webcast primarily focuses on SDI, PCIe, and DMA Controller functions. One of the historical challenges for video design has been the lack of industry interface for connecting different functions together. In response to this, Altera has lead the way in introducing a standard that has gotten significant 3 rd party and OEM support.
Altera introduced an open standard call Avalon Streaming Video. It defines a connectivity and lightweight protocol to enable interoperability for functions developed by different design teams. Avalon streaming video is the basis of a video framework that speeds video design. There are a number of 3 rd party video processing cores that support Avalon Video Streaming. Altera has a tool specific to Altera that recognizes this standard and enables integration of this blocks through a graphical tool. This tool creates interconnect that is correct by construction. This simplifies FPGA design for video systems.
Format conversion is a design that Altera developed which leverages this video framework plus internally developed video processing blocks to implement up/down/cross conversion. This design features two channels with support from 480i up to 1080p with full motion adaptive deinterlacing. A risc processor embedded in the FPGA is called Nios and it allows run-time configuration of all of the functions. This design offers the quality and feature set comparable to off-the-shelf chips plus the ability to customize for a particular application and integrate with other FPGA-based functions. This design fits in an Arria II GX device making it very cost effective versus other solutions.
This is a block diagram showing the key functions of a 4-channel SDI to PCIe Gen 2x4 implementation. Altera has offered SDI since 2003 and demonstrated the industry’s first triple rate SDI implementation at IBC 2006. Altera’s triple rate SDI core handles all the data rates and frame rates required for broadcast.
Firstly, hello, I’m Neil Childs of Omnitek and we have partnered with Altera to produce a SDI -> PCIe -> SDI reference design.
This is a complete solution, built around our Multichannel streaming DMA controller IP and accompanying driver/software. One key intention is that you should not need to have a deep understanding of PCIe. Most of the PCIe controllers available as soft and hard IP require you to format PCIe packets and manage tags etc By bundling our DMA controller, translation block, and the hard IP into one component we can handle all this for you
We started from the aim of transferring 4 * 1080p60 videos in both directions simultaneously. These are handled by what we term FDMA – FIFO DMA – channels. Our DMA controller can also use more traditional MDMA – Memory DMA – channels. The relative number, size and depth of these channels is all GUI configurable. We also wanted to minimize the local buffering required to support 3G-SDI. This requirement, together with the high overall bandwidth requires, places very high demands on our DMA controller and the PC architecture we operate within. We have designed the DMA controller and translation block together to use a lot of the more advanced PCIe features to achieve this. While we have a picture of the S4GX card on this slide, I should point out that we also provide a reference design for the Arria II GX equivalent.
The key to achieving the high bandwidth required for 4*1080p60, and the continuous bandwidth needed to allow operation with minimum local buffering is the Scatter Gather DMA controller. This is a fairly standard feature of DMA controllers, required for use in a PC environment where memory is typically allocated in 4Kbyte blocks which may not be contiguous. In this system DMA operations are broken down into segments and a linked list of these instructions is created by software. When the DMA controller completes one set of instructions it fetches the next link in the list and processes that set of instructions. It has the benefit that the CPU can setup several video frames worth of DMA instructions in advance which means that the CPU does not have to complete any time critical operations. One advanced feature of the Omnitek DMA controller is that it pre-fetches the next SG segment. If it were not to do this the system would pause at the end of a segment while the next set of instructions is fetched. This feature is critical to allow operation without external SDRAM buffering.
So having touched on the specifics of the DMA controller we now look at how this fits into the overall system. Most of the reference design has been implemented within SOPC Builder – this allows easy integration with the Altera VIP suite. We use the Altera CVI and CVO blocks to get video into SOPCB - these are designed to interface with the Altera SDI megacore IP. Omnitek have added a couple of extra components outside of SOPCB to support SMPTE352 and report the incoming video frame rate. We use our own PCIe translation block rather then the Altera PCI Express Compiler SOPC flow component as it allows us to bind the DMA controller more closely to the hard IP. However, in a similar fashion, extra PCIe BARs are mapped to Avalon-MM masters which allow them to access register space within SOPCB. BAR1, as well as accessing the CVI and CVO blocks, also addresses some Omnitek control registers.
It is worth mentioning that while the reference designs are targeted at video, the DMA controller itself just moves bytes around. It does not care what those bytes represent. If in the GUI you define a channel to be of video type, then additional Avalon-ST video interface components are included for that channel. This packs the video data into 32bit words for the DMA controller. If inside the GUI you set a channel to be of “data” type then these interface components are omitted. You can also see in this slide that the more traditional MDMA channels are mapped to an Avalon-MM master bus that could connect to other IP, such as external memory controllers.
In reality this design process within SOPC builder is a simple exercise of dropping in components and wiring the busses together. The diagram shown a 1 channel design, to add more channels you would simply double click on the DMA component to bring up the customization GUI and add another DMA channel. You would then insert the Altera CVI or CVO components and wire the video bus across to the new DMA channel. Because the DMA controller is an SOPC Builder component – it allows the user to access all the Altera VIP suite of components in their design. If you wanted to recreate the same design outside of SOPC builder, then the DMA channels are defined using the documented Avalon format.
Having transferred the frames of video into system PC memory, they are passed to DirectShow. This then DMAs the video from system memory to the Graphics card and displays the video on screen. Of course, a user design may choose to deal with the frames of video differently. Full source code for the driver and example application give users a head start on creating their own designs – with limited knowledge of PCIe. The example app itself shows the 4 input SDI streams as directshow windows – the video has been resized in the GPU, the DMA controller (in this design) always passes back the complete active video without scaling. The app also allows the user to load 4 short video clips into system memory, from where they are played out as a continuous loop.
To put these performance figures into context, 4lane gen2 PCIe = 16Gbit/sec after accounting for 8b10b. However after you have accounted for packet headers, crcs, flow control, ack/nak, the maximum achievable bandwidth is closer to 13.5Gbit/sec. You can see from these figures that we are getting reasonably close to this achievable maximum. This slide shows the bandwidth used to support 4*1080p60 in both directions. The 1.65Gbit/sec for read requests is due to the nature of PCIe. In order to move the 10.6Gbit/sec of data from the PC to the FPGA, the DMA controller must send read packets to the PC asking the PCIe root complex in the Northbridge to send that data back. As you can see, this overhead is quite a significant bandwidth in itself. You can also see that the DMA controller is quite a small part of an FPGA. The S4GX 230 has over 1200 M9Ks and over 90,000 ALMs (adaptive logic modules) = 2LUTs + 2regs.
Key message - PCIe is difficult. Omnitek has a long background not only related to video but also PCIe. By using our DMA controller you do not need PCIe hardware experience.
Hello, this is Brian Jentz again. Roger, thanks for the detailed explanation of the SDI to PCI design. Let’s wrap up by looking at where to go for more information and how to evaluate the solution.
You can go to Altera’s website to get access to the design for evaluation. You can also download user guides from that location. As mentioned during the webcast, this design runs on both Stratix IV GX and Arria II GX. We have provided links for those kits here.
Thank you for viewing today’s web cast - How to Design Your Power Delivery Network (PDN) for High-End FPGAs - brought to you by Altera Corporation. We would like to get your feedback about this webcast so please fill out the survey that will open on your screen at the conclusion of this program. If you still have questions, please click on the “Ask a Question” button and you will receive a response via email within 3 business days. My name is Seyi Verma and on behalf of Altera Corporation, thank you for joining us today.