Full hardware video processing engine simplifies video system design

Digital video processing has always been a hot issue in the application of multimedia devices. There are many standards for digital video, and it is constantly evolving. Therefore, the system design must be able to support a wider range of video formats as much as possible. The traditional choice is to use DSP for software encoding and decoding. However, with the rapid popularization of 1080i / p high-definition video applications, the amount of computing required by it has also increased dramatically. The advantages of hardware accelerator-based solutions are beginning to appear. This approach can greatly reduce processor load and meet the demanding low power consumption requirements of mobile devices. At present, more and more system solutions begin to adopt the design based on full hardware video processing engine (VPU).

Freescale's i.MX53 application processor provides a typical structure based on hardware accelerators. Its embedded full hardware VPU supports a very wide range of video formats from H.264, MPEG4, Divx to RV10, which can cover the vast Part of the video resources, and supports 1080i / p HD decoding and 720p encoding. In addition, the processor can simultaneously perform multi-channel video decoding and full-duplex multi-channel video encoding processing, and allows each channel of video to use a different format, thereby enabling dual-monitor configuration or video teleconferencing applications.

Typical hardware video processing engine structure

Unlike a full hardware VPU in the usual sense, a significant advantage of this VPU is that it can provide programmability to a certain extent and update the codec process. The reason is that it has a built-in 16-bit small programmable DSP. This processor named BIT can flexibly control the process of encoding and decoding and interact with the interface of the CPU by executing different firmware.

For the CPU, the amount of calculation required to control the VPU does not exceed 1 MIPS, and such low computing requirements are also attributed to the BIT processor. It contains a dedicated hardware accelerator to accelerate the processing of the code stream, and realizes functions including frame rate control, FMO, ASO, video codec control, and error recovery. Most of the submodules in the VPU are also highly optimized, and can be fully multiplexed when encoding and decoding various video formats, thereby reducing the number of gates and power consumption.

The MX53's VPU structure is shown in Figure 1. It is connected to the ARM processor through standard AXI / APB, so that it can access the on-chip cache to obtain high performance. VPU mainly includes two components, video codec processing IP and VPU bus converter. The former is the core of the entire VPU, mainly composed of embedded BIT processor, video CODEC and bus arbiter; the latter is responsible for converting the AMBA APB3 bus into the IP Sky Blue bus inside the VPU.

VPU structure of MX53

Video decoding process

Thanks to the highly complete control process of the BIT processor, from the perspective of the external CPU, the VPU is highly autonomously controlled. All the CPU needs to do is process management related to the VPU. It should be noted that the process here does not refer to the system process in the usual sense, but a dedicated process within the VPU.

VPU can process up to 4 channels of video in different formats at the same time, but the processing flow is the same. All start from the creation process (the system is responsible for creating and setting up a dedicated process), and then to the running process (the time point that the system running process needs to meet is that the decoder is idle and the code stream is already in memory), and finally exit the process .

If there are multiple processes ready to run, each process will be assigned a unique process index number, which is assigned based on the order of creation. For example, when 1 channel MPEG-4 decoding, 1 channel H.264 decoding, 1 channel MPGE-2 decoding and 1 channel VC-1 decoding are performed simultaneously, the MPEG-4 decoding process will be assigned index number 0, and VC-1 The decoding is assigned as index number 3.

In a multi-process environment, there is no priority in the execution of processes. After creating all the processes, the CPU will start the BIT processor to execute these processes. The BIT processor also uses a mechanism similar to time slice division to schedule a process.

We jumped out of the VPU and looked at the operation of the VPU from the perspective of the entire system. Let's take the example of decoding one H.264 stream and one MPEG-4 stream simultaneously.

First, initialize the VPU, including loading the firmware code required by the BIT processor into memory, setting initialization parameters, such as BIT processor configuration parameters, working buffer base address, BIT code address, and code stream buffer control, etc.

Then create the decoding process of H.264 stream and MPEG-4, including setting the base address and size of the stream buffer, the base address of the frame buffer, etc.

Then each process is executed alternately. A flag (Wait BusyFlag) indicates whether a frame stream has been decoded, and the stream after decoding will be sent to the image processing unit (IPU) for post-processing and display.

Finally, after the decoding is completed, the relevant memory resources are released and the process is destroyed.

"Electronic System Design"

Memory control is a key issue when using VPU

The VPU has full access to external memory. It uses external memory to load and store image frames, code streams, and code and data from the BIT processor. The amount of memory used depends on the video format itself and the target application. For example, H.264 decoding uses up to 16 reference frames, but H.263 decoding only needs to use one. In addition, different formats also need to use temporary memory of different sizes when dealing with De-blocking or superimposed smooth filtering.

Basically, the VPU uses 6 different storage areas: frame buffer (for storing a frame of image), BIT processor code memory area, working buffer (for BIT processor intermediate data and for video decoding hardware) , Code stream buffer (used to load code stream), parameter buffer (used to execute BIT processor command and return data), search RAM (used to ME module to reduce the bus load of external memory).

Among them, the processing of the code stream buffer is very critical. For each process, the system must allocate an independent code stream buffer. The external stream buffer will form a ring buffer. The BIT processor will automatically perform a loop operation after obtaining the starting address of the buffer ring.

In the decoding process, the CPU writes the code stream into the buffer, and then the BIT processor will read the code stream. If the two are not well coordinated, it will cause the code stream to be rewritten (overwriTIng) or insufficient (underflow) Once this happens, decoding will fail. To prevent this from happening, the buffer read / write pointers of the current code stream must be exchanged between the external CPU and the BIT processor inside the VPU. The write pointer of the CPU operation and the read pointer of the BIT operation must be written to the internal register. The BIT processor compares these two pointers to determine whether the code stream buffer is insufficient. If it is, you need to stop decoding to prevent Misreading the stream until the CPU writes enough stream data and updates the write pointer. Conversely, the CPU also needs to determine the read pointer before writing data to the buffer ring to determine that no code stream rewriting will occur.

In applications such as 1080i / p high-definition decoding, the memory bandwidth required by the VPU is very high, and most of the current operating systems are multi-tasking operating systems, so the problem of insufficient memory bandwidth is likely to occur, which will result in unsmooth playback. Error decoding occurs. Therefore, the use of system bandwidth must be carefully planned.

Summary of this article

From the above analysis, the use of the VPU of i.MX53 is very simple. The full encapsulation of the full hardware VPU for the codec process actually hides the complexity of this process, making the video processing as a whole An easy task, this is one of the significant advantages of the full hardware VPU. At present, the market competition for multimedia equipment is extremely fierce, and the product development time of system manufacturers is compressed to a very short time. For video solutions, application processor suppliers must ensure that their reference designs can provide simple and easy-to-use APIs, and Verified reliability and real-time codec performance, system design based on full hardware video processing is undoubtedly a very attractive solution in the market.

White Bluetooth Headset X3T headsets is built in 85 mah rechargeable lithium ion polymer batteries. The battery provides up to 6 hours call time and 4-6 hr music playtime, full charge in around 1.5 hours and up to 120 hours standby time. 700mah capacity charging case charged by usb can provide 4 rounds longer power supplies for earphones.We are the Top Chinese manufacturer of Sports White Bluetooth Headset, and look forward to cooperate with you!

White Bluetooth Headset

White Bluetooth Headset,Mini Bluetooth Headset,Earbuds With Charging Case,Twins Wireless Bluetooth Earbuds

ShenDaDian(China) Digital Electronics Co.,Ltd , http://www.btearbuds.com