Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xilinx Zynqmp: The read data is inconsistent with the data written by the PL #116

Open
zhanghongg opened this issue Feb 19, 2024 · 50 comments

Comments

@zhanghongg
Copy link

The problem is sporadic.
It seems to always happen at the last position of buf.

@ikwzm
Copy link
Owner

ikwzm commented Feb 19, 2024

Please give more information.

@zhanghongg
Copy link
Author

U-dma-buf devicetree:

	udmabuf@0x00 {
		compatible = "ikwzm,u-dma-buf";
		device-name = "udmabuf0";
		size = <0x14000000>;
	};

U-dma-buf initialization:

    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, 			base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
}

U-dma-buf ClearBuf:

    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
memset(tmp,0,map_size_);

V4l2 Userptr REQBUFS:

    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, req_count * 			buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){
        struct v4l2_buffer videoBuf;
        memset(&videoBuf, 0, sizeof(struct v4l2_buffer));
        videoBuf.index = i;
        videoBuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        videoBuf.memory = V4L2_MEMORY_USERPTR;
        if(ioctl(fd_, VIDIOC_QUERYBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QUERYBUF failed: %s", strerror(errno));
            return false;
        }
        void* myPtr = reserved_memory_->GetMappAddr();
        video_buf_[i] = reinterpret_cast<unsigned char*>(myPtr) + i*buf_length_;
        videoBuf.m.userptr = reinterpret_cast<unsigned long>(video_buf_[i]);

        if(ioctl(fd_, VIDIOC_QBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
            return false;
        }
    }

    return true;

Program running logic and Problem describe:

image

PL will write a value of the last four bytes of buf, and the ps side will write the last four bytes of buf as 0 every time the ps side reads buf. The problem is that when ps obtain buf3,it occasionally reads out the last four byte values of 0, but the pl terminal has written it as a non-0 value.

@ikwzm
Copy link
Owner

ikwzm commented Feb 20, 2024

How is Cache Coherency?

@zhanghongg
Copy link
Author

The code shown above is everything I do

@zhanghongg
Copy link
Author

Please provide some suggestions, thank you.

@zhanghongg
Copy link
Author

supplement: PL use axi_hp to write ddr.

I tried using the method that "Manual cache management with the CPU cache still being enabled" in udmabuf readme.
The problem will still occur.

May I ask how I should solve this problem?

@ikwzm
Copy link
Owner

ikwzm commented Feb 22, 2024

I tried using the method that "Manual cache management with the CPU cache still being enabled" in udmabuf readme.
The problem will still occur.

How did you do this?

@ikwzm
Copy link
Owner

ikwzm commented Feb 22, 2024

How many is the value of buf_length_ ?
What is the value of videoBuf.length ?

@ikwzm
Copy link
Owner

ikwzm commented Feb 22, 2024

What I would like to know is not so much when, but when are you syncing?

@zhanghongg
Copy link
Author

When CPU using buf, the following code is roughly executed:


// set sync_offset
unsigned char attr[1024];
unsigned long sync_offset = 16777216* index; // index: 0-19
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); /
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_size
unsigned char attr[1024];
unsigned long sync_size = 16777216;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_size); / or sprintf(attr, "0x%x", sync_size); */
write(fd, attr, strlen(attr));
close(fd);
}

// set sync_direction
unsigned char attr[1024];
unsigned long sync_direction = 1;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_direction);
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_for_cpu
unsigned char attr[1024];
unsigned long sync_for_cpu = 1;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_for_cpu);
write(fd, attr, strlen(attr));
close(fd);
}
// set sync_for_device
unsigned char attr[1024];
unsigned long sync_for_device = 0;
if ((fd = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
sprintf(attr, "%d", sync_for_device);
write(fd, attr, strlen(attr));
close(fd);
}

@ikwzm
Copy link
Owner

ikwzm commented Feb 22, 2024

Umm...
I do not know what you are doing.
Please show me the source code, not the text.

@zhanghongg
Copy link
Author

After obtaining the v4l2 buf, I executed the sync code above

@ikwzm
Copy link
Owner

ikwzm commented Feb 22, 2024

After obtaining the v4l2 buf, I executed the sync code above

It does not work at all.

@zhanghongg
Copy link
Author

What I would like to know is not so much when, but when are you syncing?

what means?

It does not work at all.

why...

@zhanghongg
Copy link
Author

Thank you very much for your answer. The purpose of using udmabuf is to improve the speed of memcpy. What should I do now?

@ikwzm
Copy link
Owner

ikwzm commented Feb 23, 2024

I ask again.

How many is the value of buf_length_ ?
What is the value of videoBuf.length ?

@zhanghongg
Copy link
Author

Both are 16,777,216.

@ikwzm
Copy link
Owner

ikwzm commented Feb 23, 2024

PL will write a value of the last four bytes of buf, and the ps side will write the last four bytes of buf as 0 every time the ps side reads buf.

Why does the ps side write the last four bytes of buf as every time the ps side reads buf ?
Can you show me the source code for this part?

@zhanghongg
Copy link
Author

Thank you very much for your help!

Why does the ps side write the last four bytes of buf as every time the ps side reads buf ?

The next time the PL end receives this buf, it will determine some logic based on this memory address

Can you show me the source code for this part?

OK!

Main logic code:

    unsigned char* buf = v4l2fd_.DqBuf();
    ...
    ... /// use this buf
    ...
    uint64_t* countBuf = reinterpret_cast<uint64_t*>(buf + stripBufSize - sizeof(uint64_t));
    *countBuf =0;
    v4l2fd_.QBuf();
unsigned char* DqBuf() {
    memset(&active_video_buf_, 0, sizeof(struct v4l2_buffer));
    active_video_buf_.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    active_video_buf_.memory = V4L2_MEMORY_USERPTR;
    if(ioctl(fd_, VIDIOC_DQBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_DQBUF failed: %s", strerror(errno));
        return 0;
    }
    reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 1);
    return video_buf_[active_video_buf_.index];
}
bool V4l2FdUserPtr::QBuf() {
    reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 0);
    if(ioctl(fd_, VIDIOC_QBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
        return false;
    }
    return true;
}
void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_direction) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = sync_direction == 1? 1:0;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = sync_direction == 1? 0:1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}

@ikwzm
Copy link
Owner

ikwzm commented Feb 23, 2024

Where is the value of *countBuf read and checked?

@zhanghongg
Copy link
Author

Main logic code:

unsigned char* buf = v4l2fd_.DqBuf();
...
... /// use this buf
...
uint64_t* countBuf = reinterpret_cast<uint64_t*>(buf + stripBufSize - sizeof(uint64_t));
*countBuf =0;
v4l2fd_.QBuf();

After DqBuf, Check at this location. " /// use this buf"

@ikwzm
Copy link
Owner

ikwzm commented Feb 24, 2024

Thank you.
By the way, this is a confirmation, Did it work correctly with V4L2_MEMORY_MMAP instead of V4L2_MEMORY_USERPTR?

@zhanghongg
Copy link
Author

Yes, it can work correctly with V4L2_MEMORY_MMAP.

@ikwzm
Copy link
Owner

ikwzm commented Feb 24, 2024

What is the device driver?

@ikwzm
Copy link
Owner

ikwzm commented Feb 24, 2024

It is better not to use memset() to clear u-dma-bufs. > #38

@zhanghongg
Copy link
Author

It is better not to use memset() to clear u-dma-bufs. > #38

OK! thank you ! I have read this answer, so I will continue to monitor the impact of memset and consider giving up using memset later on.

What is the device driver?

Are you referring to the v4l2 driver?Is there anything I should pay attention to in this regard?

Also, based on the code provided so far and my purpose, I have two questions.

Firstly, is my usage of udmabuf roughly correct?

Secondly, in order to achieve my goal that improve the speed of memcpy, is my solution direction correct.

Thank you again for your assistance.

@ikwzm
Copy link
Owner

ikwzm commented Feb 24, 2024

What is the device driver?

Are you referring to the v4l2 driver?Is there anything I should pay attention to in this regard?

Yes, what v4l2 driver are you using?

Firstly, is my usage of udmabuf roughly correct?

Specify sync_direction as 2 as follows: If the V4L2 driver is capturing, use DMA_FROM_DEVICE(=2).

void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_for_device) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned int   sync_direction = 2; // DMA_FROM_DEVICE if V4L2_BUF_TYPE_VIDEO_CAPTURE
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    if (sync_for_device == 0) 
    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = 1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    else
    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = 1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}

Secondly, in order to achieve my goal that improve the speed of memcpy, is my solution direction correct.

I believe the method is correct.
However, it may not work with some V4L2 drivers.
Therefore, please let me know what you are using for your V4L2 driver.

@ikwzm
Copy link
Owner

ikwzm commented Feb 24, 2024

There is another way to speed up mmap.
It is to use S_AXI_HPC0 or S_AXI_HPC1 or S_AXI_ACP for the PL -> PS interface to perform cache coherency in hardware and then set the dma-coherent property in the V4L2 driver device tree.
By doing so, the cache is also enabled in a way using V4L2_MEMORY_MMAP.

Please refer to the above for details> #107

For more information on cache coherency, please refer to the following URL

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency

@zhanghongg
Copy link
Author

Thank you.

There is another way to speed up mmap.
It is to use S_AXI_HPC0 or S_AXI_HPC1 or S_AXI_ACP for the PL -> PS interface to perform cache coherency in hardware and then set the dma-coherent property in the V4L2 driver device tree.
By doing so, the cache is also enabled in a way using V4L2_MEMORY_MMAP.

Please refer to the above for details> #107

For more information on cache coherency, please refer to the following URL

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency

OK! I have also considered this method, but the PL segment requires 4 axi_hp buses.

Therefore, please let me know what you are using for your V4L2 driver.

The v4l2 driver has customized some logic to tell the PL end the writable buf address. Should I provide you with the driver source code? If so, I will provide the source code on Monday.

@zhanghongg
Copy link
Author

v4l2 driver:

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/errno.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/ioport.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/pci.h>
#include <linux/random.h>
#include <linux/version.h>
#include <linux/mutex.h>
#include <linux/videodev2.h>
#include <linux/dma-mapping.h>
#include <linux/interrupt.h>
#include <linux/kthread.h>
#include <linux/highmem.h>
#include <linux/freezer.h>
#include <media/videobuf-vmalloc.h>
#include <media/v4l2-device.h>
#include <media/v4l2-ioctl.h>
#include <linux/platform_device.h>
#include <linux/gpio.h>
#include <linux/of_gpio.h>

#include "extractor_sw.h"
#include "extractor_hw.h"

/* 
 * trans video driver
 */
#define MAX_WIDTH   8192
#define MAX_HEIGHT  2048
 
#define CAPTURE_DRV_NAME "Strip driver"
#define PVI_MODULE_NAME "Strip"

void __iomem  *extractor_base = 0;
void __iomem  *sta_info = 0;

static struct extractor_port *file2port(struct file *file)
{
	return container_of(file->private_data, struct extractor_port, fh);
}

static void start_extractor(struct extractor_dev *xdev, struct extractor_buffer *buf, unsigned char buf_flag)
{
	dma_addr_t dma_addr;
	dma_addr = vb2_dma_contig_plane_dma_addr(&buf->v4l2_buf.vb2_buf, 0);
	if(buf_flag == 0) {
		iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA_A);
		iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA2_A);
	}
	else {
		iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA_B);
		iowrite32(dma_addr, extractor_base + XWR2DDR_CTRL_ADDR_DATA_DATA2_B);
	}
	//v4l2_info(&xdev->v4l2_dev, "dma_addr=0x%x\n", dma_addr);
}
/*
static void stop_dma(void)
{
}
*/
/*
 * Videobuf operations
 */
static int extractor_queue_setup(struct vb2_queue *vq,
			   unsigned int *nbuffers, unsigned int *nplanes,
			   unsigned int sizes[], struct device *alloc_devs[])
{
	struct extractor_port *port = vb2_get_drv_priv(vq);

	*nplanes = 1;
	sizes[0] = port->sizeimage;
	
	return 0;
}

static void extractor_buf_queue(struct vb2_buffer *vb)
{
				// printk("\n%s Line%d\n",__func__,__LINE__);

	struct extractor_port *port = vb2_get_drv_priv(vb->vb2_queue);
	struct extractor_dev *dev = port->dev;
	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);

	struct extractor_buffer *buf = container_of(vbuf, struct extractor_buffer, v4l2_buf);

	unsigned long flags;

	spin_lock_irqsave(&dev->slock, flags);
	list_add_tail(&buf->list, &port->vidq);
	spin_unlock_irqrestore(&dev->slock, flags);
}

static int extractor_start_streaming(struct vb2_queue *vq, unsigned int count)
{
	struct extractor_port *port = vb2_get_drv_priv(vq);
	struct extractor_dev *dev = port->dev;

	struct extractor_buffer *buf;
	unsigned long flags;

	port->sequence = 0;

	//buf a
	buf = list_entry(port->vidq.next, struct extractor_buffer, list);
	buf->allow_dq = true;
	spin_lock_irqsave(&dev->slock, flags);
	list_del(&buf->list);
	list_add_tail(&buf->dq_list, &dev->extractor_bufs_a);
	spin_unlock_irqrestore(&dev->slock, flags);
	start_extractor(dev, buf, 0);

	//buf b
	buf = list_entry(port->vidq.next, struct extractor_buffer, list);
	buf->allow_dq = true;
	spin_lock_irqsave(&dev->slock, flags);
	list_del(&buf->list);
	list_add_tail(&buf->dq_list, &dev->extractor_bufs_b);
	spin_unlock_irqrestore(&dev->slock, flags);
	start_extractor(dev, buf, 1);

	vq->streaming = 1;
	//gpio_set_value(dev->gpio_sensor_en, 1);

	return 0;
}

/*
 * Abort streaming and wait for last buffer
 */
static void extractor_stop_streaming(struct vb2_queue *vq)
{
	struct extractor_port *port = vb2_get_drv_priv(vq);
	struct extractor_dev *dev = port->dev;	

	struct extractor_buffer *buf;

	/* release all active buffers */
	while (!list_empty(&dev->extractor_bufs_a)) {
		buf = list_entry(dev->extractor_bufs_a.next,
				struct extractor_buffer, dq_list);
		list_del(&buf->dq_list);
		vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
	}
	while (!list_empty(&dev->extractor_bufs_b)) {
		buf = list_entry(dev->extractor_bufs_b.next,
				struct extractor_buffer, dq_list);
		list_del(&buf->dq_list);
		vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
	}
	while (!list_empty(&port->vidq)) {
		buf = list_entry(port->vidq.next, struct extractor_buffer, list);
		list_del(&buf->list);
		vb2_buffer_done(&buf->v4l2_buf.vb2_buf, VB2_BUF_STATE_ERROR);
	}

	vq->streaming = 0;
}	

static struct vb2_ops extractor_video_qops = {
	.queue_setup		= extractor_queue_setup,
	.buf_queue			= extractor_buf_queue,
	.start_streaming	= extractor_start_streaming,
	.stop_streaming		= extractor_stop_streaming,
	.wait_prepare		= vb2_ops_wait_prepare,
	.wait_finish		= vb2_ops_wait_finish,
};

static int extractor_open(struct file *file)
{
	struct extractor_port *port = video_drvdata(file);
	struct extractor_dev *dev = port->dev;

	if (!dev->setup_done) {
		dev->setup_done = 1;
	}	

	port->width = MAX_WIDTH;
	port->height = MAX_HEIGHT;
    
	port->bytesperline = port->width;
	port->sizeimage = port->width * port->height;	

	v4l2_fh_init(&port->fh, video_devdata(file));
	file->private_data = &port->fh;
	v4l2_fh_add(&port->fh);
	port->open = 1;

	return 0;
}
 
static int extractor_release(struct file *file)
{
	struct extractor_port *port = video_drvdata(file);
	struct vb2_queue *q = &port->vb_vidq;

	extractor_stop_streaming(q);

	if (file->private_data) {
		v4l2_fh_del((struct v4l2_fh *)file->private_data);
		v4l2_fh_exit((struct v4l2_fh *)file->private_data);
	}

	vb2_queue_release(q);

	port->open = 0;

	return 0;
}

static const struct v4l2_file_operations extractor_fops = {
	.owner = THIS_MODULE,
	.open = extractor_open,
	.release = extractor_release,
	.unlocked_ioctl = video_ioctl2,
	.mmap = vb2_fop_mmap,
	.poll = vb2_fop_poll,
};

static int extractor_querycap(struct file *file, void *priv,
			struct v4l2_capability *cap)
{
	strncpy(cap->driver, CAPTURE_DRV_NAME, sizeof(cap->driver) - 1);
	strncpy(cap->card, PVI_MODULE_NAME, sizeof(cap->card) - 1);
	strlcpy(cap->bus_info, PVI_MODULE_NAME, sizeof(cap->bus_info));
	cap->device_caps  = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_CAPTURE;
	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
	
	return 0;
}

static int extractor_enum_fmt_vid_cap(struct file *file, void *priv,
				struct v4l2_fmtdesc *fmt)
{
	fmt->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	strcpy(fmt->description, "Raw Mode-Y8");
	fmt->pixelformat = V4L2_PIX_FMT_GREY;

	return 0;
}

static int extractor_g_fmt_vid_cap(struct file *file, void *priv,
			     struct v4l2_format *f)
{
	struct extractor_port *port = file2port(file);

	f->fmt.pix.width	= port->width;
	f->fmt.pix.height	= port->height;
	f->fmt.pix.bytesperline	= port->bytesperline;
	f->fmt.pix.sizeimage	= port->sizeimage;

	return 0;
}

static int extractor_try_fmt_vid_cap(struct file *file, void *fh, struct v4l2_format *format)
{
	return 0;
}

static int extractor_s_fmt_vid_cap(struct file *file, void *priv,
 			     struct v4l2_format *f)
{
	struct extractor_port *port = file2port(file);

	f->fmt.pix.bytesperline = f->fmt.pix.width;
	f->fmt.pix.sizeimage = f->fmt.pix.width * f->fmt.pix.height;

	port->width = f->fmt.pix.width;
	port->height = f->fmt.pix.height;
	port->bytesperline = f->fmt.pix.bytesperline;
	port->sizeimage	= f->fmt.pix.sizeimage;
    
    /* do something else */

	return 0;
}

static long extractor_ioctl_default(struct file *file, void *fh, bool valid_prio,
			      unsigned int cmd, void *arg)
{
	switch (cmd) {
	default:
		return -ENOTTY;
	}
}

static const struct v4l2_ioctl_ops extractor_ioctl_ops = {
	.vidioc_querycap 			= extractor_querycap,
	.vidioc_enum_fmt_vid_cap	= extractor_enum_fmt_vid_cap,

	.vidioc_g_fmt_vid_cap	= extractor_g_fmt_vid_cap,
	.vidioc_try_fmt_vid_cap	=extractor_try_fmt_vid_cap,
	.vidioc_s_fmt_vid_cap	= extractor_s_fmt_vid_cap,

	.vidioc_reqbufs		= vb2_ioctl_reqbufs,
	.vidioc_create_bufs	= vb2_ioctl_create_bufs,
	.vidioc_prepare_buf	= vb2_ioctl_prepare_buf,
	.vidioc_querybuf	= vb2_ioctl_querybuf,
	.vidioc_qbuf		= vb2_ioctl_qbuf,
	.vidioc_dqbuf		= vb2_ioctl_dqbuf,

	.vidioc_streamon	= vb2_ioctl_streamon,
	.vidioc_streamoff	= vb2_ioctl_streamoff,
	.vidioc_log_status	= v4l2_ctrl_log_status,
	.vidioc_default		= extractor_ioctl_default,
};

static int alloc_port(struct extractor_dev *xdev)
{
	struct extractor_port *port;
	struct vb2_queue *q;
	struct video_device *vfd;
	int ret;

	port = kzalloc (sizeof(*port), GFP_KERNEL);
	if (!port)
		return -ENOMEM;

	port->dev = xdev;
	port->open = 0;

	/*
	 * Initialize queue
	 */
	q = &port->vb_vidq;
	q->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	q->io_modes = VB2_MMAP | VB2_DMABUF |VB2_USERPTR;
	q->drv_priv = port;
	q->buf_struct_size = sizeof(struct extractor_buffer);
	q->ops = &extractor_video_qops;
	q->mem_ops = &vb2_dma_contig_memops;
	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
	q->lock = &xdev->mutex;
	q->dev = &(xdev->pdev->dev);
	ret = vb2_queue_init(q);
	if (ret)
		goto do_free_port;

	INIT_LIST_HEAD(&port->vidq);
	
	vfd = video_device_alloc();
	if (!vfd)
	{
		ret = -ENOMEM;
		goto do_free_port;
	}

	vfd->device_caps = V4L2_CAP_STREAMING | V4L2_CAP_VIDEO_CAPTURE;
	vfd->v4l2_dev = &xdev->v4l2_dev;
	vfd->queue = q;
	vfd->fops		= &extractor_fops,
	vfd->ioctl_ops	= &extractor_ioctl_ops,
	vfd->minor		= -1,
	vfd->release	= video_device_release,
	vfd->lock = &xdev->mutex;
	snprintf(vfd->name, sizeof(vfd->name), "%s", PVI_MODULE_NAME);
	video_set_drvdata(vfd, port);

	ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);  //VFL_TYPE_GRABBER
	if (ret) {
		v4l2_err(&xdev->v4l2_dev, "Failed to register video device\n");
		goto do_free_port;
	}

	port->vfd = vfd;
	xdev->port = port;

	v4l2_info(&xdev->v4l2_dev, "Device registered as /dev/video%d\n", vfd->num);
	return 0;

do_free_port:
	kfree(port);
	return ret;
}

static void free_port(struct extractor_port *port)
{
	if (!port)
		return;
	v4l2_info(&(port->dev->v4l2_dev), PVI_MODULE_NAME
			" Device /dev/video%d is removed\n", port->vfd->num);
	video_unregister_device(port->vfd);
	video_device_release(port->vfd);

	kfree(port);
}

static void extractor_active_buf_next(struct extractor_port *port, unsigned char buf_flag)
{
	struct extractor_dev *dev = port->dev;
	struct extractor_buffer *buf;
	unsigned long flags;

	spin_lock_irqsave(&dev->slock, flags);

	if (!list_empty(&port->vidq)) {
		// printk("\n%s Line%d\n",__func__,__LINE__);
		buf = list_first_entry(&port->vidq, struct extractor_buffer, list);
		if(list_is_last(&buf->list, &port->vidq ))
		{
			buf->allow_dq = false;
			// printk("\n%s Line%d\n",__func__,__LINE__);
		}
		else
		{
			buf->allow_dq = true;
		}
		list_del(&buf->list);
		if(buf_flag == 0) {
			list_add_tail(&buf->dq_list, &dev->extractor_bufs_a);
		}
		else {
			list_add_tail(&buf->dq_list, &dev->extractor_bufs_b);
		}

		start_extractor(dev, buf, buf_flag);
	}
	else
	{
	}
	
	spin_unlock_irqrestore(&dev->slock, flags);
}


static void extractor_process_buffer_complete(struct extractor_port *port, unsigned char buf_flag)
{
	struct extractor_dev *dev = port->dev;
	struct vb2_buffer *vb = NULL;
	struct extractor_buffer *buf;
	struct vb2_v4l2_buffer *v4l2_buf =NULL;
	unsigned long flags;
	if(buf_flag == 0) {
		if (list_empty(&dev->extractor_bufs_a))
			return;
		buf = list_first_entry(&dev->extractor_bufs_a, struct extractor_buffer, dq_list);
	}
	else {
		if (list_empty(&dev->extractor_bufs_b))
			return;
		buf = list_first_entry(&dev->extractor_bufs_b, struct extractor_buffer, dq_list);
	}

	if (buf) {
		v4l2_buf =  &buf->v4l2_buf;
		vb = &v4l2_buf->vb2_buf;
		v4l2_buf->sequence = *(u32 *)sta_info;
		spin_lock_irqsave(&dev->slock, flags);
		list_del(&buf->dq_list);
		spin_unlock_irqrestore(&dev->slock, flags);

		if (buf->allow_dq) 
		{
#if 0
			dma_addr_t dma_addr;		
			dma_addr = vb2_dma_contig_plane_dma_addr(vb, 0);
			dma_sync_single_for_cpu(&(dev->pdev->dev), dma_addr,vb->planes[0].length, DMA_FROM_DEVICE);
#endif			
			vb2_buffer_done(vb, VB2_BUF_STATE_DONE);
			buf->allow_dq = false;
		}
		else
		{
			spin_lock_irqsave(&dev->slock, flags);
			list_add_tail(&buf->list, &port->vidq);
			spin_unlock_irqrestore(&dev->slock, flags);
		}
	} 
	else
	{
		printk("%s:%s\n",__func__,"BUG().");
		BUG();
	}

	port->sequence++;
}

static irqreturn_t extractor_irq_a(int irq, void *data)	
{
	struct extractor_dev *dev = (struct extractor_dev *)data;
	struct extractor_port *port = dev->port;

	extractor_process_buffer_complete(port, 0);
	extractor_active_buf_next(port, 0);	
	
	return IRQ_HANDLED;
}

static irqreturn_t extractor_irq_b(int irq, void *data)	
{
	struct extractor_dev *dev = (struct extractor_dev *)data;
	struct extractor_port *port = dev->port;

	extractor_process_buffer_complete(port, 1);
	extractor_active_buf_next(port, 1);	
	
	return IRQ_HANDLED;
}

static int extractor_probe(struct platform_device *pdev)
{
	struct extractor_dev *vdev;
	struct resource *res;

	int ret = 0;

	//struct device_node *dev_node = pdev->dev.of_node;
	
	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
	if (!vdev)
		return -ENOMEM;
	
	/* extractor��ַ�ռ� */
	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
	if (res == NULL) {
		dev_err(&pdev->dev, "Missing platform resources data\n");
		ret = -ENODEV;
		goto free_dev;
	}

	if (!devm_request_mem_region(&pdev->dev, res->start, resource_size(res), pdev->name))
	{
		dev_err(&pdev->dev, "Failed to request  memory resources\n");
		ret = -ENOMEM;
		goto free_dev;
	}

	extractor_base = devm_ioremap(&pdev->dev, res->start,  resource_size(res));  //devm_ioremap_nocache
	if (!extractor_base){
		ret = -ENOMEM;
		goto free_dev;
	}

	sta_info = devm_ioremap(&pdev->dev, STA_INFO_BASE,  12);  //devm_ioremap_nocache
	if (!sta_info){
		ret = -ENOMEM;
		goto free_dev;
	}

	vdev->irq_a = platform_get_irq(pdev, 0);
	if (!vdev->irq_a) {
		dev_err(&pdev->dev, "Could not get extractor irq a");
		goto free_dev;
	}

	vdev->irq_b = platform_get_irq(pdev, 1);
	if (!vdev->irq_b) {
		dev_err(&pdev->dev, "Could not get extractor irq b");
		goto free_dev;
	}

	if (devm_request_irq(&pdev->dev, vdev->irq_a, extractor_irq_a,
				   IRQF_TRIGGER_RISING, PVI_MODULE_NAME, vdev) < 0) {
		ret = -ENOMEM;
		goto free_dev;
	}	

	if (devm_request_irq(&pdev->dev, vdev->irq_b, extractor_irq_b,
				   IRQF_TRIGGER_RISING, PVI_MODULE_NAME, vdev) < 0) {
		ret = -ENOMEM;
		goto free_dev;
	}	

	spin_lock_init(&vdev->slock);
	INIT_LIST_HEAD(&vdev->extractor_bufs_a);
	INIT_LIST_HEAD(&vdev->extractor_bufs_b);

	ret = v4l2_device_register(&pdev->dev, &vdev->v4l2_dev);
	if (ret)
		goto free_irq;

	mutex_init(&vdev->mutex);

	vdev->pdev = pdev;
	vdev->setup_done = 0;

	ret = alloc_port(vdev);
	if (ret)
		goto free_irq;

	platform_set_drvdata(pdev, vdev);

	return 0;
	
free_irq:
	free_irq(vdev->irq_a, vdev);		
	free_irq(vdev->irq_b, vdev);		

free_dev:
	kfree(vdev);

	return ret;
}

static int extractor_remove(struct platform_device *pdev)
{
	struct extractor_dev *dev = platform_get_drvdata(pdev);

	//gpio_free(dev->gpio_sensor_en);
	//devm_iounmap(&pdev->dev, dev->dma0_base);

	free_irq(dev->irq_a, dev);		
	free_irq(dev->irq_b, dev);		
	
	//if(dev->setup_done)
		//vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);		
	free_port(dev->port);
	kfree(dev);
	
	return 0;
}

#if defined (CONFIG_OF)
static const struct of_device_id extractor_of_match[] = {
	{
		.compatible = "titic,extract", .data = (void *) 1,
	},
	{},
};
#else
#define extractor_of_match NULL
#endif

static struct platform_driver extractor_pdrv = {
	.probe		= extractor_probe,
	.remove		= extractor_remove,
	.driver		= {
		.name	= CAPTURE_DRV_NAME,
		.owner	= THIS_MODULE,
		.of_match_table = extractor_of_match,
	},
};

static int extractor_init(void)
{
	return platform_driver_register(&extractor_pdrv);
}
 
static void extractor_exit(void)
{
	platform_driver_unregister(&extractor_pdrv);
}
 
module_init(extractor_init);
module_exit(extractor_exit);
 
MODULE_LICENSE("GPL");

@ikwzm
Copy link
Owner

ikwzm commented Feb 26, 2024

Thank you

@ikwzm
Copy link
Owner

ikwzm commented Feb 27, 2024

I found that your V4L2 driver uses videobuf2-dma-contig.

static int alloc_port(struct extractor_dev *xdev)
{
    :
	q->mem_ops = &vb2_dma_contig_memops;
    :    
}

With videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side.

vb2_ioctl_qbuf

Let's follow how vb2_ioctl_qbuf handles this.

vb2_ioctl_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-v4l2.c#L1052

int vb2_ioctl_qbuf(struct file *file, void *priv, struct v4l2_buffer *p)
{
	struct video_device *vdev = video_devdata(file);

	if (vb2_queue_is_busy(vdev->queue, file))
		return -EBUSY;
	return vb2_qbuf(vdev->queue, vdev->v4l2_dev->mdev, p);
}
EXPORT_SYMBOL_GPL(vb2_ioctl_qbuf);

vb2_ioctl_qbuf() calls vb2_qbuf().

vb2_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-v4l2.c#L802

int vb2_qbuf(struct vb2_queue *q, struct media_device *mdev,
	     struct v4l2_buffer *b)
{
	struct media_request *req = NULL;
	int ret;

	if (vb2_fileio_is_active(q)) {
		dprintk(q, 1, "file io in progress\n");
		return -EBUSY;
	}

	ret = vb2_queue_or_prepare_buf(q, mdev, b, false, &req);
	if (ret)
		return ret;
	ret = vb2_core_qbuf(q, b->index, b, req);
	if (req)
		media_request_put(req);
	return ret;
}
EXPORT_SYMBOL_GPL(vb2_qbuf);

vb2_qbuf() calls vb2_core_qbuf()

vb2_core_qbuf()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1651

int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
		  struct media_request *req)
{
	struct vb2_buffer *vb;
	enum vb2_buffer_state orig_state;
	int ret;
	:     
	:     
	:     
	switch (vb->state) {
	case VB2_BUF_STATE_DEQUEUED:
	case VB2_BUF_STATE_IN_REQUEST:
		if (!vb->prepared) {
			ret = __buf_prepare(vb);
			if (ret)
				return ret;
		}
		break;
	case VB2_BUF_STATE_PREPARING:
		dprintk(q, 1, "buffer still being prepared\n");
		return -EINVAL;
	default:
		dprintk(q, 1, "invalid buffer state %s\n",
			vb2_state_name(vb->state));
		return -EINVAL;
	}
	:     
	:     
	:     
	dprintk(q, 2, "qbuf of buffer %d succeeded\n", vb->index);
	return 0;
}
EXPORT_SYMBOL_GPL(vb2_core_qbuf);

vb2_core_qbuf() calls __buf_prepare() to prepare the queue.

__buf_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1406

static int __buf_prepare(struct vb2_buffer *vb)
{
	struct vb2_queue *q = vb->vb2_queue;
	enum vb2_buffer_state orig_state = vb->state;
	int ret;

	if (q->error) {
		dprintk(q, 1, "fatal error occurred on queue\n");
		return -EIO;
	}

	if (vb->prepared)
		return 0;
	WARN_ON(vb->synced);

	if (q->is_output) {
		ret = call_vb_qop(vb, buf_out_validate, vb);
		if (ret) {
			dprintk(q, 1, "buffer validation failed\n");
			return ret;
		}
	}

	vb->state = VB2_BUF_STATE_PREPARING;

	switch (q->memory) {
	case VB2_MEMORY_MMAP:
		ret = __prepare_mmap(vb);
		break;
	case VB2_MEMORY_USERPTR:
		ret = __prepare_userptr(vb);
		break;
	case VB2_MEMORY_DMABUF:
		ret = __prepare_dmabuf(vb);
		break;
	default:
		WARN(1, "Invalid queue type\n");
		ret = -EINVAL;
		break;
	}

	if (ret) {
		dprintk(q, 1, "buffer preparation failed: %d\n", ret);
		vb->state = orig_state;
		return ret;
	}

	__vb2_buf_mem_prepare(vb);
	vb->prepared = 1;
	vb->state = orig_state;

	return 0;
}

__buf_prepare() calls __vb2_buf_mem_prepare() after preprocessing the buffer by memory type.

__vb2_buf_mem_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L323

static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
{
	unsigned int plane;

	if (vb->synced)
		return;

	vb->synced = 1;
	for (plane = 0; plane < vb->num_planes; ++plane)
		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
}

For videobuf2-dma-contig, call_void_memop() calls vb2_dc_prepare().

vb2_dc_prepare()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L123

static void vb2_dc_prepare(void *buf_priv)
{
	struct vb2_dc_buf *buf = buf_priv;
	struct sg_table *sgt = buf->dma_sgt;

	/* This takes care of DMABUF and user-enforced cache sync hint */
	if (buf->vb->skip_cache_sync_on_prepare)
		return;

	if (!buf->non_coherent_mem)
		return;

	/* Non-coherent MMAP only */
	if (buf->vaddr)
		flush_kernel_vmap_range(buf->vaddr, buf->size);

	/* For both USERPTR and non-coherent MMAP */
	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
}

vb2_dc_prepare() calls dma_sync_sgtable_for_device() if memory type is USERPTR.
This function performs cache synchronization.

vb2_buffer_done

Next, let's follow vb2_buffer_done called by the V4L2 driver.

vb2_buffer_done()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L1058

void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
{
	struct vb2_queue *q = vb->vb2_queue;
	unsigned long flags;

	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
		return;

	if (WARN_ON(state != VB2_BUF_STATE_DONE &&
		    state != VB2_BUF_STATE_ERROR &&
		    state != VB2_BUF_STATE_QUEUED))
		state = VB2_BUF_STATE_ERROR;

#ifdef CONFIG_VIDEO_ADV_DEBUG
	/*
	 * Although this is not a callback, it still does have to balance
	 * with the buf_queue op. So update this counter manually.
	 */
	vb->cnt_buf_done++;
#endif
	dprintk(q, 4, "done processing on buffer %d, state: %s\n",
		vb->index, vb2_state_name(state));

	if (state != VB2_BUF_STATE_QUEUED)
		__vb2_buf_mem_finish(vb);

	spin_lock_irqsave(&q->done_lock, flags);
	if (state == VB2_BUF_STATE_QUEUED) {
		vb->state = VB2_BUF_STATE_QUEUED;
	} else {
		/* Add the buffer to the done buffers list */
		list_add_tail(&vb->done_entry, &q->done_list);
		vb->state = state;
	}
	atomic_dec(&q->owned_by_drv_count);

	if (state != VB2_BUF_STATE_QUEUED && vb->req_obj.req) {
		media_request_object_unbind(&vb->req_obj);
		media_request_object_put(&vb->req_obj);
	}

	spin_unlock_irqrestore(&q->done_lock, flags);

	trace_vb2_buf_done(q, vb);

	switch (state) {
	case VB2_BUF_STATE_QUEUED:
		return;
	default:
		/* Inform any processes that may be waiting for buffers */
		wake_up(&q->done_wq);
		break;
	}
}
EXPORT_SYMBOL_GPL(vb2_buffer_done);

If state is VB2_BUF_STATE_DONE, vb2_buffer_done() calls __vb2_buf_mem_finish().

__vb2_buf_mem_finish()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-core.c#L339

static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
{
	unsigned int plane;

	if (!vb->synced)
		return;

	vb->synced = 0;
	for (plane = 0; plane < vb->num_planes; ++plane)
		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
}

For videobuf2-dma-contig, call_void_memop() calls vb2_dc_finish().

vb2_dc_finish()

https://elixir.bootlin.com/linux/v6.1.70/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L143

static void vb2_dc_finish(void *buf_priv)
{
	struct vb2_dc_buf *buf = buf_priv;
	struct sg_table *sgt = buf->dma_sgt;

	/* This takes care of DMABUF and user-enforced cache sync hint */
	if (buf->vb->skip_cache_sync_on_finish)
		return;

	if (!buf->non_coherent_mem)
		return;

	/* Non-coherent MMAP only */
	if (buf->vaddr)
		invalidate_kernel_vmap_range(buf->vaddr, buf->size);

	/* For both USERPTR and non-coherent MMAP */
	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
}

vb2_dc_finish() calls dma_sync_sgtable_for_cpu() if memory type is USERPTR.
This function performs cache synchronization.

Conclusion

With videobuf2-dma-contig, cache synchronization is automatically performed inside the Linux Kernel, so there is no need to explicitly sync u-dma-bufs on the user application side.

The cause of the problem is becoming more and more difficult to understand.

@zhanghongg
Copy link
Author

Many thanks.

If udmabuf requests an additional buf_size_ memory size than v4l2 requires, the problem will no longer occur.

Namely, apply for udmabuf with a size of (req_count+1) * buf'length_,

The number of bufs registered to V4L2 is req_count, with a size of buf_length_.


bool V4l2FdUserPtr::ReqBufs(uint32_t req_count) {
    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){
        struct v4l2_buffer videoBuf;
        memset(&videoBuf, 0, sizeof(struct v4l2_buffer));
        videoBuf.index = i;
        videoBuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        videoBuf.memory = V4L2_MEMORY_USERPTR;
        if(ioctl(fd_, VIDIOC_QUERYBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QUERYBUF failed: %s", strerror(errno));
            return false;
        }
        /// 每个buf的尺寸
        // buf_length_ = videoBuf.length;
        printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);
        void* myPtr = reserved_memory_->GetMappAddr();
        video_buf_[i] = reinterpret_cast<unsigned char*>(myPtr) + i*buf_length_;
        videoBuf.m.userptr = reinterpret_cast<unsigned long>(video_buf_[i]);

        if(ioctl(fd_, VIDIOC_QBUF, &videoBuf)){
            log_error("v4l2 VIDIOC_QBUF failed: %s", strerror(errno));
            return false;
        }
    }

    return true;
}

@ikwzm
Copy link
Owner

ikwzm commented Feb 29, 2024

What is the next output result?

        printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);

What is the value of req_count?

What is the value of stripBufSize?

What is the next process? Do you have the source code?

    video_buf_.resize(req_count);

What is the next process? Do you have the source code?

    dq_video_bufs_.resize(req_count);

@zhanghongg
Copy link
Author

What is the next output result?

    printf("VIDIOC_QUERYBUF videoBufSize: %d, SetFmtBufSize: %d\n", videoBuf.length, buf_length_);

Both are 16,777,216.

What is the value of req_count?

During the testing process, there were 4.

What is the value of stripBufSize?

It is also 16,777,216.

What is the next process? Do you have the source code?

video_buf_.resize(req_count);

It is only used in DqBuf.

unsigned char* V4l2FdUserPtr::DqBuf() {
    memset(&active_video_buf_, 0, sizeof(struct v4l2_buffer));
    active_video_buf_.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    active_video_buf_.memory = V4L2_MEMORY_USERPTR;
    if(ioctl(fd_, VIDIOC_DQBUF, &active_video_buf_)){
        log_error("v4l2 VIDIOC_DQBUF failed: %s", strerror(errno));
        return 0;
    }
    // reserved_memory_->Sync(buf_length_, active_video_buf_.index*buf_length_, 1);
    return video_buf_[active_video_buf_.index];
}

What is the next process? Do you have the source code?

dq_video_bufs_.resize(req_count);

Please ignore it, it is currently not being used in any actual interface used.

@ikwzm
Copy link
Owner

ikwzm commented Mar 1, 2024

What is the value of req_count?

During the testing process, there were 4.

What is the value of reqBuf.count after the next ioctl(fd_, VIDIOC_REQBUFS,&reqBufs)?

bool V4l2FdUserPtr::ReqBufs(uint32_t req_count) {
    uint64_t reserved_base_addr = 0;
    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

    struct v4l2_requestbuffers reqBufs;
    memset(&reqBufs, 0, sizeof(struct v4l2_requestbuffers));
    reqBufs.count  = req_count;
    reqBufs.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    reqBufs.memory = V4L2_MEMORY_USERPTR;
    if (ioctl(fd_, VIDIOC_REQBUFS, &reqBufs)) {
        log_error("v4l2 VIDIOC_REQBUFS failed: %s", strerror(errno));
        return false;
    }
    video_buf_.resize(req_count);
    dq_video_bufs_.resize(req_count);
    for(int i = 0; i < reqBufs.count; i++){

Is it the same value as req_count?

@ikwzm
Copy link
Owner

ikwzm commented Mar 1, 2024

    reserved_memory_ = std::make_shared<UdmaBuf>(reserved_base_addr, (req_count+1) * buf_length_);

Where is the source code for the UdmaBuf class?

@zhanghongg
Copy link
Author

Is it the same value as req_count?

yes, it is.

Where is the source code for the UdmaBuf class?

UdmaBuf::UdmaBuf(uint64_t base_addr, uint64_t map_size): base_addr_(base_addr), map_size_(map_size) {
    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
    }
}

@ikwzm
Copy link
Owner

ikwzm commented Mar 5, 2024

Where is ClearBuf located?
Can you show me the source code for "all" of the Udmabuf class, not just the constructor?

@zhanghongg
Copy link
Author

#include "UdmaBuf.h"
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <assert.h>
#include "log.h"
#include <string.h>
UdmaBuf::UdmaBuf(uint64_t base_addr, uint64_t map_size): base_addr_(base_addr), map_size_(map_size) {
    fd_ = open("/dev/udmabuf0", O_RDWR);
    if (fd_ < 0) {   
        log_error("UdmaBuf mmap!");
        assert(false);
    }
    user_ptr_ = mmap64(NULL, map_size_, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, base_addr_);
    if(user_ptr_ != MAP_FAILED) {
        log_info("UdmaBuf mmap succuffully!");
        ClearBuf();
    }
    else {
        log_error("UdmaBuf mmap failed!"); 
    }
}

UdmaBuf::~UdmaBuf() {
    close(fd_);
    munmap(user_ptr_, map_size_);
}

void* UdmaBuf::GetMappAddr() {
    return user_ptr_;
}


void UdmaBuf::SetSyncMode(uint8_t mode){

    unsigned char attr[1024];
    int tmpfd_;
    unsigned long  sync_mode = mode;
    if ((tmpfd_  = open("/sys/class/u-dma-buf/udmabuf0/sync_mode", O_WRONLY)) != -1) {
        sprintf((char *)attr, "%d", sync_mode);            
        write(tmpfd_, attr, strlen((const char *)attr));
        close(tmpfd_);
    }
    else {
        log_error("SetSyncMode failed!"); 
    }
}

void UdmaBuf::ClearBuf() {
    /// no memset!!
    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
            memset(tmp,0,map_size_);
    // for(uint64_t i=0; i < map_size_; ++i) {
    //     memset(tmp,0,map_size);
    //     tmp[i] = 0;
    // }            
}

void UdmaBuf::Sync(unsigned long sync_size, unsigned long sync_offset,
                   unsigned int sync_direction) {
    int fd;
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }
    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_direction);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_cpu = sync_direction == 1? 1:0;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_cpu);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

    {
        unsigned char  attr[1024];
        unsigned long  sync_for_device = sync_direction == 1? 0:1;
        if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
            sprintf((char *)attr, "%d", sync_for_device);
            write(fd, attr, strlen((const char *)attr));
            close(fd);
        }
    }

}

@ikwzm
Copy link
Owner

ikwzm commented Mar 5, 2024

umm...
Did I make a bad point?
Where is UdmaBuf.h?

@zhanghongg
Copy link
Author

Sorry.

#pragma once
#include <cstdint>
#include <stdint.h>

class UdmaBuf {
public:
    UdmaBuf(uint64_t base_addr, uint64_t map_size);
    virtual ~UdmaBuf();
    void* GetMappAddr();
    uint64_t GetMapSize() const {
        return map_size_;
    }
    void ClearBuf();

    void Sync(unsigned long sync_size, unsigned long sync_offset, unsigned int sync_direction);
protected:
     


    // As listed below, sync_mode can be used to configure the cache behavior when the O_SYNC flag is present in open():
    // sync_mode=0: CPU cache is enabled regardless of the O_SYNC flag presence.
    // sync_mode=1: If O_SYNC is specified, CPU cache is disabled. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=2: If O_SYNC is specified, CPU cache is disabled but CPU uses write-combine when writing data to DMA buffer improves performance by combining multiple write accesses. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=3: If O_SYNC is specified, DMA coherency mode is used. If O_SYNC is not specified, CPU cache is enabled.
    // sync_mode=4: CPU cache is enabled regardless of the O_SYNC flag presence.
    // sync_mode=5: CPU cache is disabled regardless of the O_SYNC flag presence.
    // sync_mode=6: CPU uses write-combine to write data to DMA buffer regardless of O_SYNC presence.
    // sync_mode=7: DMA coherency mode is used regardless of O_SYNC presence.
    void SetSyncMode(uint8_t mode);
private:
    uint64_t base_addr_;
    uint64_t map_size_;
    int fd_;
    void* user_ptr_;
};

@ikwzm
Copy link
Owner

ikwzm commented Mar 5, 2024

Thank you!

@ikwzm
Copy link
Owner

ikwzm commented Mar 6, 2024

The source code you gave us uses memset to clear u-dma-buf, have you checked the behavior without memset?
If not, please check the behavior again with Clear without memset.

@zhanghongg
Copy link
Author

Yes, I have checked.
Nothing happened, and it takes a long time.

void UdmaBuf::ClearBuf() {
    /// no memset!!
    unsigned char* tmp = reinterpret_cast<unsigned char*>(user_ptr_);
    for(uint64_t i=0; i < map_size_; ++i) {
        tmp[i] = 0;
    }            
}

@ikwzm
Copy link
Owner

ikwzm commented Mar 6, 2024

Yes, I have checked.
Nothing happened, and it takes a long time.

Nothing happend?
Has the buffer been cleared?
Did the value written by PL match the value read by the CPU?

@zhanghongg
Copy link
Author

The phenomenon is consistent.

Has the buffer been cleared?

yes, it has been cleared.

Did the value written by PL match the value read by the CPU?

mismatching.

@zhanghongg
Copy link
Author

I am very sorry to ask questions again. Previously, on the ZynqMP platform, my V4L2 USERPTR + U-DMA-BUF was running well. Now I am doing the same thing on the Zynq platform. The software code and logic are the same. The difference is that the system version is different.

Recall my previous logic:

  1. The PL side writes data to the U-DMA-BUF.
  2. The PS side receives the data and writes the last 8 bytes of the BUF as 0.
  3. Re-add the U-DMA-BUF to the queue available on the PL side.
  4. After the PL side obtains the U-DMA-BUF again, it judges whether the last 8 bytes are 0. If it is 0, it meets the expectation. If it is not 0, the error count is increased.

This logic runs normally on the ZynqMP platform. On the Zynq platform, I get different results:
There is a high probability that the last 8 bytes of the data received by the PL side are not 0.

Other test information:
After the second step is completed, I will directly read the address of the last 8 bytes and find that the last 8 bytes have indeed been modified to 0.
But when I hand it back to the PL side through V4L2, the last 8 bytes of the data received by the PL side are not 0. The PL will send an interrupt. V4L2 will return this BUF to the PS side again. After the PS side receives it and reads the last 8 bytes, it is found that at this time it is indeed no longer 0.

I am sure that my code logic is correct and it has been working normally on the ZynqMP platform. However, the problem encountered on the Zynq platform confuses me. I am really sorry to disturb you again.

@ikwzm
Copy link
Owner

ikwzm commented Oct 23, 2024

Sorry, I know that V4L2_MEMORY_USERPTR + u-dma-buf will not work on newer versions of Linux.
The following fix was made to the Linux Kernel around December 2022 and applied starting with Linux Kernel 5.15.

As a result of this fix, the method of using V4L2_MEMORY_USERPTR by allocating a buffer with u-dma-buf no longer works.

The above post points out that V4L2_MEMORY_USERPTR was originally deprecated and that the fallback to VM_PFMAP/VM_IO is fundamentally dangerous. As a result, it appears that the restrictions on using memory areas on userspace as buffers for V4L2 have been tightened, and as a result, buffers allocated with u-dma-buf can no longer be used.

With this change, I can no longer run V4L2_MEMORY_USERPTR + u-dma-buf in our environment. Therefore, I cannot reproduce your issue in my environment.

The problem of slow V4L2 running on ZynqMP/Zynq/RaspberryPi has a solution in newer Linux using dma-heap.
Here is an article describing this solution. It is written in Japanese, but please translate it into your language.

@zhanghongg
Copy link
Author

Great! I have read your solution. I plan to try it in kernel version 5.15.

Could you please help me confirm whether my current driver can be adapted to this modification? Thank you very much.

[](#116 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants