-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xilinx Zynqmp: The read data is inconsistent with the data written by the PL #116
Comments
Please give more information. |
U-dma-buf devicetree:
U-dma-buf initialization:
U-dma-buf ClearBuf:
V4l2 Userptr REQBUFS:
Program running logic and Problem describe: PL will write a value of the last four bytes of buf, and the ps side will write the last four bytes of buf as 0 every time the ps side reads buf. The problem is that when ps obtain buf3,it occasionally reads out the last four byte values of 0, but the pl terminal has written it as a non-0 value. |
How is Cache Coherency? |
The code shown above is everything I do |
Please provide some suggestions, thank you. |
supplement: PL use axi_hp to write ddr. I tried using the method that "Manual cache management with the CPU cache still being enabled" in udmabuf readme. May I ask how I should solve this problem? |
How did you do this? |
How many is the value of buf_length_ ? |
What I would like to know is not so much when, but when are you syncing? |
When CPU using buf, the following code is roughly executed:
|
Umm... |
After obtaining the v4l2 buf, I executed the sync code above |
It does not work at all. |
what means?
why... |
Thank you very much for your answer. The purpose of using udmabuf is to improve the speed of memcpy. What should I do now? |
I ask again. How many is the value of buf_length_ ? |
Both are 16,777,216. |
Why does the ps side write the last four bytes of buf as every time the ps side reads buf ? |
Thank you very much for your help!
The next time the PL end receives this buf, it will determine some logic based on this memory address
OK! Main logic code:
|
Where is the value of *countBuf read and checked? |
After DqBuf, Check at this location. " /// use this buf" |
Thank you. |
Yes, it can work correctly with V4L2_MEMORY_MMAP. |
What is the device driver? |
It is better not to use memset() to clear u-dma-bufs. > #38 |
OK! thank you ! I have read this answer, so I will continue to monitor the impact of memset and consider giving up using memset later on.
Are you referring to the v4l2 driver?Is there anything I should pay attention to in this regard? Also, based on the code provided so far and my purpose, I have two questions. Firstly, is my usage of udmabuf roughly correct? Secondly, in order to achieve my goal that improve the speed of memcpy, is my solution direction correct. Thank you again for your assistance. |
Yes, what v4l2 driver are you using?
Specify sync_direction as 2 as follows: If the V4L2 driver is capturing, use DMA_FROM_DEVICE(=2).
I believe the method is correct. |
There is another way to speed up mmap. Please refer to the above for details> #107 For more information on cache coherency, please refer to the following URL https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency |
Thank you.
OK! I have also considered this method, but the PL segment requires 4 axi_hp buses.
The v4l2 driver has customized some logic to tell the PL end the writable buf address. Should I provide you with the driver source code? If so, I will provide the source code on Monday. |
v4l2 driver:
|
Thank you |
I found that your V4L2 driver uses videobuf2-dma-contig. static int alloc_port(struct extractor_dev *xdev)
{
:
q->mem_ops = &vb2_dma_contig_memops;
:
} With videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side. vb2_ioctl_qbufLet's follow how vb2_ioctl_qbuf handles this. vb2_ioctl_qbuf()int vb2_ioctl_qbuf(struct file *file, void *priv, struct v4l2_buffer *p)
{
struct video_device *vdev = video_devdata(file);
if (vb2_queue_is_busy(vdev->queue, file))
return -EBUSY;
return vb2_qbuf(vdev->queue, vdev->v4l2_dev->mdev, p);
}
EXPORT_SYMBOL_GPL(vb2_ioctl_qbuf); vb2_ioctl_qbuf() calls vb2_qbuf(). vb2_qbuf()https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-v4l2.c#L802 int vb2_qbuf(struct vb2_queue *q, struct media_device *mdev,
struct v4l2_buffer *b)
{
struct media_request *req = NULL;
int ret;
if (vb2_fileio_is_active(q)) {
dprintk(q, 1, "file io in progress\n");
return -EBUSY;
}
ret = vb2_queue_or_prepare_buf(q, mdev, b, false, &req);
if (ret)
return ret;
ret = vb2_core_qbuf(q, b->index, b, req);
if (req)
media_request_put(req);
return ret;
}
EXPORT_SYMBOL_GPL(vb2_qbuf); vb2_qbuf() calls vb2_core_qbuf() vb2_core_qbuf()int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
struct media_request *req)
{
struct vb2_buffer *vb;
enum vb2_buffer_state orig_state;
int ret;
:
:
:
switch (vb->state) {
case VB2_BUF_STATE_DEQUEUED:
case VB2_BUF_STATE_IN_REQUEST:
if (!vb->prepared) {
ret = __buf_prepare(vb);
if (ret)
return ret;
}
break;
case VB2_BUF_STATE_PREPARING:
dprintk(q, 1, "buffer still being prepared\n");
return -EINVAL;
default:
dprintk(q, 1, "invalid buffer state %s\n",
vb2_state_name(vb->state));
return -EINVAL;
}
:
:
:
dprintk(q, 2, "qbuf of buffer %d succeeded\n", vb->index);
return 0;
}
EXPORT_SYMBOL_GPL(vb2_core_qbuf); vb2_core_qbuf() calls __buf_prepare() to prepare the queue. __buf_prepare()static int __buf_prepare(struct vb2_buffer *vb)
{
struct vb2_queue *q = vb->vb2_queue;
enum vb2_buffer_state orig_state = vb->state;
int ret;
if (q->error) {
dprintk(q, 1, "fatal error occurred on queue\n");
return -EIO;
}
if (vb->prepared)
return 0;
WARN_ON(vb->synced);
if (q->is_output) {
ret = call_vb_qop(vb, buf_out_validate, vb);
if (ret) {
dprintk(q, 1, "buffer validation failed\n");
return ret;
}
}
vb->state = VB2_BUF_STATE_PREPARING;
switch (q->memory) {
case VB2_MEMORY_MMAP:
ret = __prepare_mmap(vb);
break;
case VB2_MEMORY_USERPTR:
ret = __prepare_userptr(vb);
break;
case VB2_MEMORY_DMABUF:
ret = __prepare_dmabuf(vb);
break;
default:
WARN(1, "Invalid queue type\n");
ret = -EINVAL;
break;
}
if (ret) {
dprintk(q, 1, "buffer preparation failed: %d\n", ret);
vb->state = orig_state;
return ret;
}
__vb2_buf_mem_prepare(vb);
vb->prepared = 1;
vb->state = orig_state;
return 0;
} __buf_prepare() calls __vb2_buf_mem_prepare() after preprocessing the buffer by memory type. __vb2_buf_mem_prepare()https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L323 static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
{
unsigned int plane;
if (vb->synced)
return;
vb->synced = 1;
for (plane = 0; plane < vb->num_planes; ++plane)
call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
} For videobuf2-dma-contig, call_void_memop() calls vb2_dc_prepare(). vb2_dc_prepare()static void vb2_dc_prepare(void *buf_priv)
{
struct vb2_dc_buf *buf = buf_priv;
struct sg_table *sgt = buf->dma_sgt;
/* This takes care of DMABUF and user-enforced cache sync hint */
if (buf->vb->skip_cache_sync_on_prepare)
return;
if (!buf->non_coherent_mem)
return;
/* Non-coherent MMAP only */
if (buf->vaddr)
flush_kernel_vmap_range(buf->vaddr, buf->size);
/* For both USERPTR and non-coherent MMAP */
dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
} vb2_dc_prepare() calls dma_sync_sgtable_for_device() if memory type is USERPTR. vb2_buffer_doneNext, let's follow vb2_buffer_done called by the V4L2 driver. vb2_buffer_done()void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
struct vb2_queue *q = vb->vb2_queue;
unsigned long flags;
if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
return;
if (WARN_ON(state != VB2_BUF_STATE_DONE &&
state != VB2_BUF_STATE_ERROR &&
state != VB2_BUF_STATE_QUEUED))
state = VB2_BUF_STATE_ERROR;
#ifdef CONFIG_VIDEO_ADV_DEBUG
/*
* Although this is not a callback, it still does have to balance
* with the buf_queue op. So update this counter manually.
*/
vb->cnt_buf_done++;
#endif
dprintk(q, 4, "done processing on buffer %d, state: %s\n",
vb->index, vb2_state_name(state));
if (state != VB2_BUF_STATE_QUEUED)
__vb2_buf_mem_finish(vb);
spin_lock_irqsave(&q->done_lock, flags);
if (state == VB2_BUF_STATE_QUEUED) {
vb->state = VB2_BUF_STATE_QUEUED;
} else {
/* Add the buffer to the done buffers list */
list_add_tail(&vb->done_entry, &q->done_list);
vb->state = state;
}
atomic_dec(&q->owned_by_drv_count);
if (state != VB2_BUF_STATE_QUEUED && vb->req_obj.req) {
media_request_object_unbind(&vb->req_obj);
media_request_object_put(&vb->req_obj);
}
spin_unlock_irqrestore(&q->done_lock, flags);
trace_vb2_buf_done(q, vb);
switch (state) {
case VB2_BUF_STATE_QUEUED:
return;
default:
/* Inform any processes that may be waiting for buffers */
wake_up(&q->done_wq);
break;
}
}
EXPORT_SYMBOL_GPL(vb2_buffer_done); If state is VB2_BUF_STATE_DONE, vb2_buffer_done() calls __vb2_buf_mem_finish(). __vb2_buf_mem_finish()https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L339 static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
{
unsigned int plane;
if (!vb->synced)
return;
vb->synced = 0;
for (plane = 0; plane < vb->num_planes; ++plane)
call_void_memop(vb, finish, vb->planes[plane].mem_priv);
} For videobuf2-dma-contig, call_void_memop() calls vb2_dc_finish(). vb2_dc_finish()static void vb2_dc_finish(void *buf_priv)
{
struct vb2_dc_buf *buf = buf_priv;
struct sg_table *sgt = buf->dma_sgt;
/* This takes care of DMABUF and user-enforced cache sync hint */
if (buf->vb->skip_cache_sync_on_finish)
return;
if (!buf->non_coherent_mem)
return;
/* Non-coherent MMAP only */
if (buf->vaddr)
invalidate_kernel_vmap_range(buf->vaddr, buf->size);
/* For both USERPTR and non-coherent MMAP */
dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
} vb2_dc_finish() calls dma_sync_sgtable_for_cpu() if memory type is USERPTR. ConclusionWith videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side. The cause of the problem is becoming more and more difficult to understand. |
Many thanks. If udmabuf requests an additional buf_size_ memory size than v4l2 requires, the problem will no longer occur. Namely, apply for udmabuf with a size of (req_count+1) * buf'length_, The number of bufs registered to V4L2 is req_count, with a size of buf_length_.
|
What is the next output result? printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_); What is the value of req_count? What is the value of stripBufSize? What is the next process? Do you have the source code? video_buf_.resize(req_count); What is the next process? Do you have the source code? dq_video_bufs_.resize(req_count); |
Both are 16,777,216.
During the testing process, there were 4.
It is also 16,777,216.
It is only used in DqBuf.
Please ignore it, it is currently not being used in any actual interface used. |
What is the value of reqBuf.count after the next ioctl(fd_, VIDIOC_REQBUFS,&reqBufs)? bool V4l2FdUserPtr::ReqBufs(uint32_t req_count) {
uint64_t reserved_base_addr = 0;
reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);
struct v4l2_requestbuffers reqBufs;
memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
reqBufs.count = req_count;
reqBufs.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
reqBufs.memory = V4L2_MEMORY_USERPTR;
if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
return false;
}
video_buf_.resize(req_count);
dq_video_bufs_.resize(req_count);
for(int i = 0; i < reqBufs.count; i++){ Is it the same value as req_count? |
reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_); Where is the source code for the UdmaBuf class? |
yes, it is.
|
Where is ClearBuf located? |
|
umm... |
Sorry.
|
Thank you! |
The source code you gave us uses memset to clear u-dma-buf, have you checked the behavior without memset? |
Yes, I have checked.
|
Nothing happend? |
The phenomenon is consistent.
yes, it has been cleared.
mismatching. |
I am very sorry to ask questions again. Previously, on the ZynqMP platform, my V4L2 USERPTR + U-DMA-BUF was running well. Now I am doing the same thing on the Zynq platform. The software code and logic are the same. The difference is that the system version is different. Recall my previous logic:
This logic runs normally on the ZynqMP platform. On the Zynq platform, I get different results: Other test information: I am sure that my code logic is correct and it has been working normally on the ZynqMP platform. However, the problem encountered on the Zynq platform confuses me. I am really sorry to disturb you again. |
Sorry, I know that V4L2_MEMORY_USERPTR + u-dma-buf will not work on newer versions of Linux. As a result of this fix, the method of using V4L2_MEMORY_USERPTR by allocating a buffer with u-dma-buf no longer works. The above post points out that V4L2_MEMORY_USERPTR was originally deprecated and that the fallback to VM_PFMAP/VM_IO is fundamentally dangerous. As a result, it appears that the restrictions on using memory areas on userspace as buffers for V4L2 have been tightened, and as a result, buffers allocated with u-dma-buf can no longer be used. With this change, I can no longer run V4L2_MEMORY_USERPTR + u-dma-buf in our environment. Therefore, I cannot reproduce your issue in my environment. The problem of slow V4L2 running on ZynqMP/Zynq/RaspberryPi has a solution in newer Linux using dma-heap. |
Great! I have read your solution. I plan to try it in kernel version 5.15. Could you please help me confirm whether my current driver can be adapted to this modification? Thank you very much. [](#116 (comment)) |
The problem is sporadic.
It seems to always happen at the last position of buf.
The text was updated successfully, but these errors were encountered: