[分享][每日更新][2024.03.26][CV_arxiv_papers]

328 阅读18分钟

[UPDATED!] 2024-03-26 (Publish Time)

生成模型

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26SLEDGE: Synthesizing Simulation Environments for Driving Agents with Generative ModelsSLEDGE:使用生成模型综合驾驶代理的模拟环境Kashyap Chitta, Daniel Dauner, Andreas Geigerarxiv.org/pdf/2403.17…null
2024-03-26AID: Attention Interpolation of Text-to-Image DiffusionAID:文本到图像扩散的注意力插值Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yaoarxiv.org/pdf/2403.17…null
2024-03-26Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos利用近场照明从内窥镜视频中进行单眼深度估计Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Senguptaarxiv.org/pdf/2403.17…null
2024-03-26Boosting Diffusion Models with Moving Average Sampling in Frequency Domain使用频域移动平均采样增强扩散模型Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Meiarxiv.org/pdf/2403.17…null
2024-03-26DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual DescriptionsDiffH2O:基于扩散的文本描述手-物体交互合成Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekinarxiv.org/pdf/2403.17…null
2024-03-26Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields使用去噪扩散概率模型和流场生成带注释的生物医学视频Rüveyda Yilmaz, Dennis Eschweiler, Johannes Stegmaierarxiv.org/pdf/2403.17…null
2024-03-26Improving Text-to-Image Consistency via Automatic Prompt Optimization通过自动提示优化提高文本到图像的一致性Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzalarxiv.org/pdf/2403.17…null
2024-03-26GenesisTex: Adapting Image Denoising Diffusion to Texture SpaceGenesisTex:使图像去噪扩散适应纹理空间Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yuarxiv.org/pdf/2403.17…null
2024-03-26CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node SegmentationCT 合成与条件扩散模型用于腹部淋巴结分割Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, et.al.arxiv.org/pdf/2403.17…null
2024-03-26Makeup Prior Models for 3D Facial Makeup Estimation and Applications用于 3D 面部化妆估计和应用的化妆先验模型Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamoriarxiv.org/pdf/2403.17…null
2024-03-26AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait AnimationAniPortrait:音频驱动的真实肖像动画合成Huawei Wei, Zejun Yang, Zhisheng Wangarxiv.org/pdf/2403.17…null
2024-03-26Manifold-Guided Lyapunov Control with Diffusion Models具有扩散模型的流形引导李雅普诺夫控制Amartya Mukherjee, Thanin Quartz, Jun Liuarxiv.org/pdf/2403.17…null
2024-03-26Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes并非所有相似之处都是平等的:利用数据驱动的偏见来告知 GenAI 版权纠纷Uri Hacohen, Adi Haviv, Shahar Sarfaty, Bruria Friedman, Niva Elkin-Koren, Roi Livni, Amit H Bermanoarxiv.org/pdf/2403.17…null
2024-03-26DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic PreservationDiffFAE:通过空间敏感的定制和语义保留推进高保真一次性面部外观编辑Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fuarxiv.org/pdf/2403.17…null
2024-03-26AniArtAvatar: Animatable 3D Art Avatar from a Single ImageAniArtAvatar:来自单个图像的可动画 3D 艺术头像Shaoxu Liarxiv.org/pdf/2403.17…null
2024-03-26Practical Applications of Advanced Cloud Services and Generative AI Systems in Medical Image Analysis高级云服务和生成式人工智能系统在医学图像分析中的实际应用Jingyu Xu, Binbin Wu, Jiaxin Huang, Yulu Gong, Yifan Zhang, Bo Liuarxiv.org/pdf/2403.17…null
2024-03-26SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational AutoencoderSeNM-VAE:使用分层变分自动编码器的半监督噪声建模Dihan Zheng, Yihang Zou, Xiaowen Zhang, Chenglong Baoarxiv.org/pdf/2403.17…null
2024-03-26DiffGaze: A Diffusion Model for Continuous Gaze Sequence Generation on 360° ImagesDiffGaze:用于在 360° 图像上生成连续注视序列的扩散模型Chuhan Jiao, Yao Wang, Guanhua Zhang, Mihai Bâce, Zhiming Hu, Andreas Bullingarxiv.org/pdf/2403.17…null
2024-03-26LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image DetectionLaRE^2:基于潜在重建误差的扩散生成图像检测方法Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Dingarxiv.org/pdf/2403.17…null
2024-03-26Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model建立跨空间和时间分辨率的桥梁:通过变化先验和条件扩散模型实现基于参考的超分辨率Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fuarxiv.org/pdf/2403.17…null
2024-03-26InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse DiffusionInterHandGen:通过级联反向扩散生成双手交互Jihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kimarxiv.org/pdf/2403.17…null
2024-03-26Neural Clustering based Visual Representation Learning基于神经聚类的视觉表示学习Guikun Chen, Xia Li, Yi Yang, Wenguan Wangarxiv.org/pdf/2403.17…null
2024-03-26Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance具有扰动注意力引导的自校正扩散采样Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kimarxiv.org/pdf/2403.17…null
2024-03-26Tracing and segmentation of molecular patterns in 3-dimensional cryo-et/em density maps through algorithmic image processing and deep learning-based techniques通过算法图像处理和基于深度学习的技术对 3 维冷冻电子/电子密度图中的分子模式进行追踪和分割Salim Sazzedarxiv.org/pdf/2403.17…null

多模态

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture SynthesisConvoFusion:用于协同语音手势合成的多模态会话扩散Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobaltarxiv.org/pdf/2403.17…null
2024-03-26ReMamber: Referring Image Segmentation with Mamba TwisterReMamber:使用 Mamba Twister 进行图像分割Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya Zhang, Yanfeng Wangarxiv.org/pdf/2403.17…null
2024-03-26Assessment of Multimodal Large Language Models in Alignment with Human Values评估符合人类价值观的多模态大语言模型Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shaoarxiv.org/pdf/2403.17…null
2024-03-26Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications评估快速设计的大型多模态模型与微调视觉转换器在基于图像的安全应用中的效果Fouad Trad, Ali Chehabarxiv.org/pdf/2403.17…null
2024-03-26Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting用于多模态轨迹预测的分层光变换器集成Adrien Lafage, Mathieu Barbier, Gianni Franchi, David Filliatarxiv.org/pdf/2403.17…null
2024-03-26MMVP: A Multimodal MoCap Dataset with Vision and Pressure SensorsMMVP:带有视觉和压力传感器的多模式 MoCap 数据集He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Caoarxiv.org/pdf/2403.17…null
2024-03-26A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions用于澄清模糊日语问题的基于凝视的视觉问答数据集Shun Inadumi, Seiya Kawano, Akishige Yuguchi, Yasutomo Kawanishi, Koichiro Yoshinoarxiv.org/pdf/2403.17…null
2024-03-26Language Models are Free Boosters for Biomedical Imaging Tasks语言模型是生物医学成像任务的免费助推器Zhixin Lai, Jing Wu, Suiyao Chen, Yucheng Zhou, Anna Hovakimyan, Naira Hovakimyanarxiv.org/pdf/2403.17…null
2024-03-26OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd RepresentationOVER-NAV:通过开放词汇检测和结构化表示提升迭代视觉和语言导航Ganlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yuarxiv.org/pdf/2403.17…null

Nerf

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D GaussiansOctree-GS:利用 LOD 结构的 3D 高斯实现一致的实时渲染Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Daiarxiv.org/pdf/2403.17…null
2024-03-26NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided SegmentationNeRF-HuGS:使用启发式引导分割改进非静态场景中的神经辐射场Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, Guanbin Liarxiv.org/pdf/2403.17…null

3DGS

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-262D Gaussian Splatting for Geometrically Accurate Radiance Fields用于几何精确辐射场的二维高斯溅射Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gaoarxiv.org/pdf/2403.17…null
2024-03-26DN-Splatter: Depth and Normal Priors for Gaussian Splatting and MeshingDN-Splatter:高斯泼溅和网格划分的深度和法线先验Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannalaarxiv.org/pdf/2403.17…null

模型压缩/优化

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Superior and Pragmatic Talking Face Generation with Teacher-Student Framework利用师生框架生成优质实用的说话人脸Chao Liang, Jianwen Jiang, Tianyun Zhong, Gaojie Lin, Zhengkun Rong, Jiaqi Yang, Yongming Zhuarxiv.org/pdf/2403.17…null
2024-03-26Exploring Dynamic Transformer for Efficient Object Tracking探索动态变压器以实现高效的对象跟踪Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Luarxiv.org/pdf/2403.17…null
2024-03-26Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks压缩链:组合压缩卷积神经网络的系统方法Yingtao Shen, Minqing Sun, Jie Zhao, An Zouarxiv.org/pdf/2403.17…null

分类/检测/识别/分割/...

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Efficient Video Object Segmentation via Modulated Cross-Attention Memory通过调制交叉注意力记忆进行高效视频对象分割Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khanarxiv.org/pdf/2403.17…null
2024-03-26OmniVid: A Generative Framework for Universal Video UnderstandingOmniVid:通用视频理解的生成框架Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiangarxiv.org/pdf/2403.17…null
2024-03-26AiOS: All-in-One-Stage Expressive Human Pose and Shape EstimationAiOS:多合一阶段富有表现力的人体姿势和形状估计Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, et.al.arxiv.org/pdf/2403.17…null
2024-03-26FastCAR: Fast Classification And Regression Multi-Task Learning via Task Consolidation for Modelling a Continuous Property Variable of Object ClassesFastCAR:通过任务合并进行快速分类和回归多任务学习,用于对对象类的连续属性变量进行建模Anoop Kini, Andreas Jansche, Timo Bernthaler, Gerhard Schneiderarxiv.org/pdf/2403.17…null
2024-03-26CMP: Cooperative Motion Prediction with Multi-Agent CommunicationCMP:多智能体通信的协作运动预测Zhuoyuan Wu, Yuping Wang, Hengbo Ma, Zhaowei Li, Hang Qiu, Jiachen Liarxiv.org/pdf/2403.17…null
2024-03-26ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change DetectionELGC-Net:用于遥感变化检测的高效局部全局上下文聚合Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khanarxiv.org/pdf/2403.17…null
2024-03-26Sen2Fire: A Challenging Benchmark Dataset for Wildfire Detection using Sentinel DataSen2Fire:使用 Sentinel 数据进行野火检测的具有挑战性的基准数据集Yonghao Xu, Amanda Berg, Leif Haglundarxiv.org/pdf/2403.17…null
2024-03-26Deepfake Generation and Detection: A Benchmark and SurveyDeepfake 生成和检测:基准和调查Gan Pei, Jiangning Zhang, Menghan Hu, Guangtao Zhai, Chengjie Wang, Zhenyu Zhang, Jian Yang, Chunhua Shen, Dacheng Taoarxiv.org/pdf/2403.17…null
2024-03-26Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation用于基于语言的机器人导航的分层开放词汇 3D 场景图Abdelrhman Werby, Chenguang Huang, Martin Büchner, Abhinav Valada, Wolfram Burgardarxiv.org/pdf/2403.17…null
2024-03-26GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image ReconstructionGTA-HDR:用于 HDR 图像重建的大规模合成数据集Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamyarxiv.org/pdf/2403.17…null
2024-03-26A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities利用胸部 CT 体积和放射学报告进行监督级零样本异常检测的基础模型Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, et.al.arxiv.org/pdf/2403.17…null
2024-03-26Noise2Noise Denoising of CRISM Hyperspectral DataCRISM 高光谱数据的 Noise2Noise 去噪Robert Platt, Rossella Arcucci, Cédric Johnarxiv.org/pdf/2403.17…null
2024-03-26Paired Diffusion: Generation of related, synthetic PET-CT-Segmentation scans using Linked Denoising Diffusion Probabilistic Models配对扩散:使用链接去噪扩散概率模型生成相关的合成 PET-CT 分割扫描Rowan Bradbury, Katherine A. Vallis, Bartlomiej W. Papiezarxiv.org/pdf/2403.17…null
2024-03-26FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual ContextsFastPerson:通过保留语言和视觉上下文的有效视频摘要来增强视频学习Kazuki Kawamura, Jun Rekimotoarxiv.org/pdf/2403.17…null
2024-03-26Deep Learning for Segmentation of Cracks in High-Resolution Images of Steel Bridges钢桥高分辨率图像中裂缝分割的深度学习Andrii Kompanets, Gautam Pai, Remco Duits, Davide Leonetti, Bert Snijderarxiv.org/pdf/2403.17…null
2024-03-26Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark不可见气体检测:RGB-热交叉注意力网络和新基准Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Pengarxiv.org/pdf/2403.17…null
2024-03-26Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection基于 Transformer 的视觉关系检测的分组查询专业化和质量感知多重分配Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kimarxiv.org/pdf/2403.17…null
2024-03-26The Solution for the CVPR 2023 1st foundation model challenge-Track2CVPR 2023第一届基础模型挑战赛解决方案-Track2Haonan Xu, Yurui Huang, Sishun Pan, Zhihao Guan, Yi Xu, Yang Yangarxiv.org/pdf/2403.17…null
2024-03-26Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation旋转扫描:带有 Triplet SSM 模块的类似 UNet 的 Mamba,用于医学图像分割Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wuarxiv.org/pdf/2403.17…null
2024-03-26PlainMamba: Improving Non-Hierarchical Mamba in Visual RecognitionPlainMamba:改进视觉识别中的非分层 MambaChenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowleyarxiv.org/pdf/2403.17…null
2024-03-26UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain GapsUADA3D:利用稀疏 LiDAR 和大域间隙进行 3D 物体检测的无监督对抗域适应Maciej K Wozniak, Mattias Hansson, Marko Thiel, Patric Jensfeltarxiv.org/pdf/2403.17…null
2024-03-26Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets假的还是 JPEG?揭示生成的图像检测数据集中的常见偏差Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, Janis Keuperarxiv.org/pdf/2403.17…null
2024-03-26Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models双记忆网络:视觉语言模型的多功能适应方法Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhangarxiv.org/pdf/2403.17…null
2024-03-26Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification通过解开的自监督学习和医学图像分类元学习来促进少样本学习Eva Pachetti, Sotirios A. Tsaftaris, Sara Colantonioarxiv.org/pdf/2403.17…null
2024-03-26Random-coupled Neural Network随机耦合神经网络Haoran Liu, Mingzhe Liu, Peng Li, Jiahui Wu, Xin Jiang, Zhuo Zuo, Bingqi Liuarxiv.org/pdf/2403.17…null
2024-03-26Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line SegmentsDr.Hair:通过线段的可微渲染无需预训练即可重建头皮连接的发丝Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato, Zihao Zhu, Bo Zhengarxiv.org/pdf/2403.17…null
2024-03-26Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion集成曼巴序列模型和分层上采样网络实现多发性硬化症军团的精确语义分割Kazi Shahriar Sanjid, Md. Tanzim Hossain, Md. Shakib Shahariar Junayed, Dr. Mohammad Monir Uddinarxiv.org/pdf/2403.17…null
2024-03-26Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching测试时间适应与图像增强的结合:通过不确定性感知 Logit 切换提高准确性Shohei Enomoto, Naoya Hasegawa, Kazuki Adachi, Taku Sasaki, Shin'ya Yamaguchi, Satoshi Suzuki, Takeharu Edaarxiv.org/pdf/2403.17…null
2024-03-26SSF3D: Strict Semi-Supervised 3D Object Detection with Switching FilterSSF3D:使用切换滤波器的严格半监督 3D 物体检测Songbur Wongarxiv.org/pdf/2403.17…null
2024-03-26Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection用于半监督单目 3D 物体检测的解耦伪标记Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Liarxiv.org/pdf/2403.17…null
2024-03-26AIDE: An Automatic Data Engine for Object Detection in Autonomous DrivingAIDE:自动驾驶中物体检测的自动数据引擎Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandrakerarxiv.org/pdf/2403.17…null
2024-03-26Activity-Biometrics: Person Identification from Daily Activities活动生物识别:日常活动中的人员识别Shehreen Azad, Yogesh Singh Rawatarxiv.org/pdf/2403.17…null
2024-03-26Staircase Localization for Autonomous Exploration in Urban Environments城市环境中自主探索的楼梯定位Jinrae Kim, Sunggoo Jung, Sung-Kyun Kim, Youdan Kim, Ali-akbar Agha-mohammadiarxiv.org/pdf/2403.17…null
2024-03-26Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer利用时间频率相关性和通过知识迁移学习位置信息的频谱图语音情感识别的准确性增强方法Jeong-Yoon Kim, Seung-Ho Leearxiv.org/pdf/2403.17…null

OCR

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26The Solution for the ICCV 2023 1st Scientific Figure Captioning ChallengeICCV 2023 第一届科学图形字幕挑战赛的解决方案Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yangarxiv.org/pdf/2403.17…null

图像理解

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Track Everything Everywhere Fast and Robustly快速、稳健地跟踪任何地方的一切Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidisarxiv.org/pdf/2403.17…null
2024-03-26A Survey on 3D Egocentric Human Pose Estimation3D 以自我为中心的人体姿势估计调查Md Mushfiqur Azam, Kevin Desaiarxiv.org/pdf/2403.17…null
2024-03-26Predicting Perceived Gloss: Do Weak Labels Suffice?预测感知光泽:弱标签就足够了吗?Julia Guerrero-Viu, J. Daniel Subias, Ana Serrano, Katherine R. Storrs, Roland W. Fleming, Belen Masia, Diego Gutierrezarxiv.org/pdf/2403.17…null
2024-03-26Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving针对自动驾驶中单目深度估计的物理 3D 对抗攻击Junhao Zheng, Chenhao Lin, Jiahao Sun, Zhengyu Zhao, Qian Li, Chao Shenarxiv.org/pdf/2403.17…null

Transformer

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Towards Explaining Hypercomplex Neural Networks解释超复杂神经网络Eleonora Lopez, Eleonora Grassucci, Debora Capriotti, Danilo Comminielloarxiv.org/pdf/2403.17…null
2024-03-26Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space ModelsSerpent:通过多尺度结构化状态空间模型进行可扩展且高效的图像恢复Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabiarxiv.org/pdf/2403.17…null
2024-03-26Panonut360: A Head and Eye Tracking Dataset for Panoramic VideoPanonut360:全景视频的头部和眼睛跟踪数据集Yutong Xu, Junhao Du, Jiahe Wang, Yuwei Ning, Sihan Zhou Yang Caoarxiv.org/pdf/2403.17…null
2024-03-26High-Resolution Image Translation Model Based on Grayscale Redefinition基于灰度重定义的高分辨率图像翻译模型Xixian Wu, Dian Chao, Yang Yangarxiv.org/pdf/2403.17…null
2024-03-26Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting ImagesGrad-CAMO:从 3D 细胞绘画图像中学习可解释的单细胞形态特征Vivek Gopalakrishnan, Jingzhe Ma, Zhiyong Xiearxiv.org/pdf/2403.17…null
2024-03-26Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation为草图补丁配备上下文感知位置编码以实现图形草图表示Sicong Zang, Zhijun Fangarxiv.org/pdf/2403.17…null
2024-03-26TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild VideosTRAM:野外视频中 3D 人体的全局轨迹和运动Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidisarxiv.org/pdf/2403.17…null

3D/CG

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26TC4D: Trajectory-Conditioned Text-to-4D GenerationTC4D:轨迹条件文本到 4D 生成Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, et.al.arxiv.org/pdf/2403.17…null
2024-03-26To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of 3D Transfer Learning监督或不监督:理解和解决 3D 迁移学习的主要挑战Souhail Hadgi, Lei Li, Maks Ovsjanikovarxiv.org/pdf/2403.17…null
2024-03-26Towards 3D Vision with Low-Cost Single-Photon Cameras利用低成本单光子相机迈向 3D 视觉Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Liarxiv.org/pdf/2403.17…null
2024-03-26DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright ProtectionDataCook:为医疗保健数据版权保护制作反对抗示例Sihan Shang, Jiancheng Yang, Zhenglong Sun, Pascal Fuaarxiv.org/pdf/2403.17…null
2024-03-26Multi-Task Dense Prediction via Mixture of Low-Rank Experts通过低阶专家混合的多任务密集预测Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Liarxiv.org/pdf/2403.17…null
2024-03-26Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency不可靠的学习:具有相对几何一致性的快速少样本体素辐射场Yingjie Xu, Bangzhen Liu, Hao Tang, Bailin Deng, Shengfeng Hearxiv.org/pdf/2403.17…null
2024-03-26DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D MappingDeepMIF:用于大规模 LiDAR 3D 测绘的深度单调隐式场Kutay Yılmaz, Matthias Nießner, Anastasiia Kornilova, Alexey Artemovarxiv.org/pdf/2403.17…null
2024-03-26WordRobe: Text-Guided Generation of Textured 3D GarmentsWordRobe:文本引导生成纹理 3D 服装Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharmaarxiv.org/pdf/2403.17…null

各类学习方式

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental LearningDS-AL:用于无示例类增量学习的双流分析学习Huiping Zhuang, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Zhiping Linarxiv.org/pdf/2403.17…null
2024-03-26CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt TuningCoDA:具有严重性感知视觉提示调整的指导性域链适应Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xiangwei Zhu, Zhenming Jiarxiv.org/pdf/2403.17…link

其他

Publish DateTitleTitle_CNAuthorsPDFCode
2024-03-26Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2使用 R2D2 进行可扩展非笛卡尔磁共振成像Chen Yiwei, Tang Chao, Aghabiglou Amir, Chu Chung San, Wiaux Yvesarxiv.org/pdf/2403.17…null
2024-03-26Low-Latency Neural Stereo Streaming低延迟神经立体声流Qiqi Hou, Farzad Farhadzadeh, Amir Said, Guillaume Sautiere, Hoang Learxiv.org/pdf/2403.17…null
2024-03-26Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders使用暹罗裁剪蒙版自动编码器进行高效图像预训练Alexandre Eymaël, Renaud Vandeghen, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeckarxiv.org/pdf/2403.17…null
2024-03-26MUTE-SLAM: Real-Time Neural SLAM with Multiple Tri-Plane Hash RepresentationsMUTE-SLAM:具有多个三平面哈希表示的实时神经 SLAMYifan Yan, Ruomin He, Zhenghua Liuarxiv.org/pdf/2403.17…null
2024-03-26Boosting Adversarial Training via Fisher-Rao Norm-based Regularization通过基于 Fisher-Rao 范数的正则化促进对抗训练Xiangyu Yin, Wenjie Ruanarxiv.org/pdf/2403.17…null
2024-03-26Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies分担成功成本:评估和学习协作多智能体指令给予和遵循策略的游戏Philipp Sadler, Sherzod Hakimov, David Schlangenarxiv.org/pdf/2403.17…null
2024-03-26Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge学习在没有先验声源知识的情况下从混合物中视觉定位声源Dongjin Kim, Sung Jin Um, Sangmin Lee, Jung Uk Kimarxiv.org/pdf/2403.17…null
2024-03-26Labeling subtypes in a Parkinson's Cohort using Multifeatures in MRI - Integrating Grey and White Matter Information使用 MRI 中的多特征标记帕金森病队列中的亚型 - 整合灰质和白质信息Tanmayee Samantaray, Jitender Saini, Pramod Kumar Pal, Bithiah Grace Jaganathan, Vijaya V Saradhi, Gupta CNarxiv.org/pdf/2403.17…null