[UPDATED!] 2024-03-26 (Publish Time)
生成模型
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | SLEDGE: Synthesizing Simulation Environments for Driving Agents with Generative Models | SLEDGE:使用生成模型综合驾驶代理的模拟环境 | Kashyap Chitta, Daniel Dauner, Andreas Geiger | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | AID: Attention Interpolation of Text-to-Image Diffusion | AID:文本到图像扩散的注意力插值 | Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos | 利用近场照明从内窥镜视频中进行单眼深度估计 | Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Boosting Diffusion Models with Moving Average Sampling in Frequency Domain | 使用频域移动平均采样增强扩散模型 | Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | DiffH2O:基于扩散的文本描述手-物体交互合成 | Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields | 使用去噪扩散概率模型和流场生成带注释的生物医学视频 | Rüveyda Yilmaz, Dennis Eschweiler, Johannes Stegmaier | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Improving Text-to-Image Consistency via Automatic Prompt Optimization | 通过自动提示优化提高文本到图像的一致性 | Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | GenesisTex: Adapting Image Denoising Diffusion to Texture Space | GenesisTex:使图像去噪扩散适应纹理空间 | Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation | CT 合成与条件扩散模型用于腹部淋巴结分割 | Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, et.al. | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Makeup Prior Models for 3D Facial Makeup Estimation and Applications | 用于 3D 面部化妆估计和应用的化妆先验模型 | Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | AniPortrait:音频驱动的真实肖像动画合成 | Huawei Wei, Zejun Yang, Zhisheng Wang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Manifold-Guided Lyapunov Control with Diffusion Models | 具有扩散模型的流形引导李雅普诺夫控制 | Amartya Mukherjee, Thanin Quartz, Jun Liu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes | 并非所有相似之处都是平等的:利用数据驱动的偏见来告知 GenAI 版权纠纷 | Uri Hacohen, Adi Haviv, Shahar Sarfaty, Bruria Friedman, Niva Elkin-Koren, Roi Livni, Amit H Bermano | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation | DiffFAE:通过空间敏感的定制和语义保留推进高保真一次性面部外观编辑 | Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | AniArtAvatar: Animatable 3D Art Avatar from a Single Image | AniArtAvatar:来自单个图像的可动画 3D 艺术头像 | Shaoxu Li | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Practical Applications of Advanced Cloud Services and Generative AI Systems in Medical Image Analysis | 高级云服务和生成式人工智能系统在医学图像分析中的实际应用 | Jingyu Xu, Binbin Wu, Jiaxin Huang, Yulu Gong, Yifan Zhang, Bo Liu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder | SeNM-VAE:使用分层变分自动编码器的半监督噪声建模 | Dihan Zheng, Yihang Zou, Xiaowen Zhang, Chenglong Bao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DiffGaze: A Diffusion Model for Continuous Gaze Sequence Generation on 360° Images | DiffGaze:用于在 360° 图像上生成连续注视序列的扩散模型 | Chuhan Jiao, Yao Wang, Guanhua Zhang, Mihai Bâce, Zhiming Hu, Andreas Bulling | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection | LaRE^2:基于潜在重建误差的扩散生成图像检测方法 | Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Ding | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model | 建立跨空间和时间分辨率的桥梁:通过变化先验和条件扩散模型实现基于参考的超分辨率 | Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion | InterHandGen:通过级联反向扩散生成双手交互 | Jihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kim | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Neural Clustering based Visual Representation Learning | 基于神经聚类的视觉表示学习 | Guikun Chen, Xia Li, Yi Yang, Wenguan Wang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance | 具有扰动注意力引导的自校正扩散采样 | Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Tracing and segmentation of molecular patterns in 3-dimensional cryo-et/em density maps through algorithmic image processing and deep learning-based techniques | 通过算法图像处理和基于深度学习的技术对 3 维冷冻电子/电子密度图中的分子模式进行追踪和分割 | Salim Sazzed | arxiv.org/pdf/2403.17… | null |
多模态
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis | ConvoFusion:用于协同语音手势合成的多模态会话扩散 | Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | ReMamber: Referring Image Segmentation with Mamba Twister | ReMamber:使用 Mamba Twister 进行图像分割 | Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya Zhang, Yanfeng Wang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Assessment of Multimodal Large Language Models in Alignment with Human Values | 评估符合人类价值观的多模态大语言模型 | Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications | 评估快速设计的大型多模态模型与微调视觉转换器在基于图像的安全应用中的效果 | Fouad Trad, Ali Chehab | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting | 用于多模态轨迹预测的分层光变换器集成 | Adrien Lafage, Mathieu Barbier, Gianni Franchi, David Filliat | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors | MMVP:带有视觉和压力传感器的多模式 MoCap 数据集 | He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions | 用于澄清模糊日语问题的基于凝视的视觉问答数据集 | Shun Inadumi, Seiya Kawano, Akishige Yuguchi, Yasutomo Kawanishi, Koichiro Yoshino | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Language Models are Free Boosters for Biomedical Imaging Tasks | 语言模型是生物医学成像任务的免费助推器 | Zhixin Lai, Jing Wu, Suiyao Chen, Yucheng Zhou, Anna Hovakimyan, Naira Hovakimyan | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation | OVER-NAV:通过开放词汇检测和结构化表示提升迭代视觉和语言导航 | Ganlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yu | arxiv.org/pdf/2403.17… | null |
Nerf
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians | Octree-GS:利用 LOD 结构的 3D 高斯实现一致的实时渲染 | Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation | NeRF-HuGS:使用启发式引导分割改进非静态场景中的神经辐射场 | Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, Guanbin Li | arxiv.org/pdf/2403.17… | null |
3DGS
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | 2D Gaussian Splatting for Geometrically Accurate Radiance Fields | 用于几何精确辐射场的二维高斯溅射 | Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing | DN-Splatter:高斯泼溅和网格划分的深度和法线先验 | Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala | arxiv.org/pdf/2403.17… | null |
模型压缩/优化
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Superior and Pragmatic Talking Face Generation with Teacher-Student Framework | 利用师生框架生成优质实用的说话人脸 | Chao Liang, Jianwen Jiang, Tianyun Zhong, Gaojie Lin, Zhengkun Rong, Jiaqi Yang, Yongming Zhu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | 探索动态变压器以实现高效的对象跟踪 | Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks | 压缩链:组合压缩卷积神经网络的系统方法 | Yingtao Shen, Minqing Sun, Jie Zhao, An Zou | arxiv.org/pdf/2403.17… | null |
分类/检测/识别/分割/...
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Efficient Video Object Segmentation via Modulated Cross-Attention Memory | 通过调制交叉注意力记忆进行高效视频对象分割 | Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | OmniVid:通用视频理解的生成框架 | Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation | AiOS:多合一阶段富有表现力的人体姿势和形状估计 | Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, et.al. | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | FastCAR: Fast Classification And Regression Multi-Task Learning via Task Consolidation for Modelling a Continuous Property Variable of Object Classes | FastCAR:通过任务合并进行快速分类和回归多任务学习,用于对对象类的连续属性变量进行建模 | Anoop Kini, Andreas Jansche, Timo Bernthaler, Gerhard Schneider | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | CMP: Cooperative Motion Prediction with Multi-Agent Communication | CMP:多智能体通信的协作运动预测 | Zhuoyuan Wu, Yuping Wang, Hengbo Ma, Zhaowei Li, Hang Qiu, Jiachen Li | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection | ELGC-Net:用于遥感变化检测的高效局部全局上下文聚合 | Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Sen2Fire: A Challenging Benchmark Dataset for Wildfire Detection using Sentinel Data | Sen2Fire:使用 Sentinel 数据进行野火检测的具有挑战性的基准数据集 | Yonghao Xu, Amanda Berg, Leif Haglund | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Deepfake Generation and Detection: A Benchmark and Survey | Deepfake 生成和检测:基准和调查 | Gan Pei, Jiangning Zhang, Menghan Hu, Guangtao Zhai, Chengjie Wang, Zhenyu Zhang, Jian Yang, Chunhua Shen, Dacheng Tao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation | 用于基于语言的机器人导航的分层开放词汇 3D 场景图 | Abdelrhman Werby, Chenguang Huang, Martin Büchner, Abhinav Valada, Wolfram Burgard | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | GTA-HDR:用于 HDR 图像重建的大规模合成数据集 | Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities | 利用胸部 CT 体积和放射学报告进行监督级零样本异常检测的基础模型 | Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, et.al. | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Noise2Noise Denoising of CRISM Hyperspectral Data | CRISM 高光谱数据的 Noise2Noise 去噪 | Robert Platt, Rossella Arcucci, Cédric John | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Paired Diffusion: Generation of related, synthetic PET-CT-Segmentation scans using Linked Denoising Diffusion Probabilistic Models | 配对扩散:使用链接去噪扩散概率模型生成相关的合成 PET-CT 分割扫描 | Rowan Bradbury, Katherine A. Vallis, Bartlomiej W. Papiez | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts | FastPerson:通过保留语言和视觉上下文的有效视频摘要来增强视频学习 | Kazuki Kawamura, Jun Rekimoto | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Deep Learning for Segmentation of Cracks in High-Resolution Images of Steel Bridges | 钢桥高分辨率图像中裂缝分割的深度学习 | Andrii Kompanets, Gautam Pai, Remco Duits, Davide Leonetti, Bert Snijder | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark | 不可见气体检测:RGB-热交叉注意力网络和新基准 | Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection | 基于 Transformer 的视觉关系检测的分组查询专业化和质量感知多重分配 | Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | The Solution for the CVPR 2023 1st foundation model challenge-Track2 | CVPR 2023第一届基础模型挑战赛解决方案-Track2 | Haonan Xu, Yurui Huang, Sishun Pan, Zhihao Guan, Yi Xu, Yang Yang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation | 旋转扫描:带有 Triplet SSM 模块的类似 UNet 的 Mamba,用于医学图像分割 | Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | PlainMamba:改进视觉识别中的非分层 Mamba | Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps | UADA3D:利用稀疏 LiDAR 和大域间隙进行 3D 物体检测的无监督对抗域适应 | Maciej K Wozniak, Mattias Hansson, Marko Thiel, Patric Jensfelt | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets | 假的还是 JPEG?揭示生成的图像检测数据集中的常见偏差 | Patrick Grommelt, Louis Weiss, Franz-Josef Pfreundt, Janis Keuper | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models | 双记忆网络:视觉语言模型的多功能适应方法 | Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification | 通过解开的自监督学习和医学图像分类元学习来促进少样本学习 | Eva Pachetti, Sotirios A. Tsaftaris, Sara Colantonio | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Random-coupled Neural Network | 随机耦合神经网络 | Haoran Liu, Mingzhe Liu, Peng Li, Jiahui Wu, Xin Jiang, Zhuo Zuo, Bingqi Liu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments | Dr.Hair:通过线段的可微渲染无需预训练即可重建头皮连接的发丝 | Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato, Zihao Zhu, Bo Zheng | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion | 集成曼巴序列模型和分层上采样网络实现多发性硬化症军团的精确语义分割 | Kazi Shahriar Sanjid, Md. Tanzim Hossain, Md. Shakib Shahariar Junayed, Dr. Mohammad Monir Uddin | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching | 测试时间适应与图像增强的结合:通过不确定性感知 Logit 切换提高准确性 | Shohei Enomoto, Naoya Hasegawa, Kazuki Adachi, Taku Sasaki, Shin'ya Yamaguchi, Satoshi Suzuki, Takeharu Eda | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | SSF3D: Strict Semi-Supervised 3D Object Detection with Switching Filter | SSF3D:使用切换滤波器的严格半监督 3D 物体检测 | Songbur Wong | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection | 用于半监督单目 3D 物体检测的解耦伪标记 | Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving | AIDE:自动驾驶中物体检测的自动数据引擎 | Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Activity-Biometrics: Person Identification from Daily Activities | 活动生物识别:日常活动中的人员识别 | Shehreen Azad, Yogesh Singh Rawat | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Staircase Localization for Autonomous Exploration in Urban Environments | 城市环境中自主探索的楼梯定位 | Jinrae Kim, Sunggoo Jung, Sung-Kyun Kim, Youdan Kim, Ali-akbar Agha-mohammadi | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer | 利用时间频率相关性和通过知识迁移学习位置信息的频谱图语音情感识别的准确性增强方法 | Jeong-Yoon Kim, Seung-Ho Lee | arxiv.org/pdf/2403.17… | null |
OCR
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge | ICCV 2023 第一届科学图形字幕挑战赛的解决方案 | Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang | arxiv.org/pdf/2403.17… | null |
图像理解
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Track Everything Everywhere Fast and Robustly | 快速、稳健地跟踪任何地方的一切 | Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidis | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | 3D 以自我为中心的人体姿势估计调查 | Md Mushfiqur Azam, Kevin Desai | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Predicting Perceived Gloss: Do Weak Labels Suffice? | 预测感知光泽:弱标签就足够了吗? | Julia Guerrero-Viu, J. Daniel Subias, Ana Serrano, Katherine R. Storrs, Roland W. Fleming, Belen Masia, Diego Gutierrez | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving | 针对自动驾驶中单目深度估计的物理 3D 对抗攻击 | Junhao Zheng, Chenhao Lin, Jiahao Sun, Zhengyu Zhao, Qian Li, Chao Shen | arxiv.org/pdf/2403.17… | null |
Transformer
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Towards Explaining Hypercomplex Neural Networks | 解释超复杂神经网络 | Eleonora Lopez, Eleonora Grassucci, Debora Capriotti, Danilo Comminiello | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models | Serpent:通过多尺度结构化状态空间模型进行可扩展且高效的图像恢复 | Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Panonut360: A Head and Eye Tracking Dataset for Panoramic Video | Panonut360:全景视频的头部和眼睛跟踪数据集 | Yutong Xu, Junhao Du, Jiahe Wang, Yuwei Ning, Sihan Zhou Yang Cao | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | High-Resolution Image Translation Model Based on Grayscale Redefinition | 基于灰度重定义的高分辨率图像翻译模型 | Xixian Wu, Dian Chao, Yang Yang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images | Grad-CAMO:从 3D 细胞绘画图像中学习可解释的单细胞形态特征 | Vivek Gopalakrishnan, Jingzhe Ma, Zhiyong Xie | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation | 为草图补丁配备上下文感知位置编码以实现图形草图表示 | Sicong Zang, Zhijun Fang | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos | TRAM:野外视频中 3D 人体的全局轨迹和运动 | Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidis | arxiv.org/pdf/2403.17… | null |
3D/CG
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | TC4D: Trajectory-Conditioned Text-to-4D Generation | TC4D:轨迹条件文本到 4D 生成 | Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, et.al. | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of 3D Transfer Learning | 监督或不监督:理解和解决 3D 迁移学习的主要挑战 | Souhail Hadgi, Lei Li, Maks Ovsjanikov | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Towards 3D Vision with Low-Cost Single-Photon Cameras | 利用低成本单光子相机迈向 3D 视觉 | Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection | DataCook:为医疗保健数据版权保护制作反对抗示例 | Sihan Shang, Jiancheng Yang, Zhenglong Sun, Pascal Fua | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Multi-Task Dense Prediction via Mixture of Low-Rank Experts | 通过低阶专家混合的多任务密集预测 | Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency | 不可靠的学习:具有相对几何一致性的快速少样本体素辐射场 | Yingjie Xu, Bangzhen Liu, Hao Tang, Bailin Deng, Shengfeng He | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping | DeepMIF:用于大规模 LiDAR 3D 测绘的深度单调隐式场 | Kutay Yılmaz, Matthias Nießner, Anastasiia Kornilova, Alexey Artemov | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | WordRobe: Text-Guided Generation of Textured 3D Garments | WordRobe:文本引导生成纹理 3D 服装 | Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma | arxiv.org/pdf/2403.17… | null |
各类学习方式
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning | DS-AL:用于无示例类增量学习的双流分析学习 | Huiping Zhuang, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Zhiping Lin | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning | CoDA:具有严重性感知视觉提示调整的指导性域链适应 | Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xiangwei Zhu, Zhenming Ji | arxiv.org/pdf/2403.17… | link |
其他
| Publish Date | Title | Title_CN | Authors | Code | |
|---|---|---|---|---|---|
| 2024-03-26 | Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2 | 使用 R2D2 进行可扩展非笛卡尔磁共振成像 | Chen Yiwei, Tang Chao, Aghabiglou Amir, Chu Chung San, Wiaux Yves | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Low-Latency Neural Stereo Streaming | 低延迟神经立体声流 | Qiqi Hou, Farzad Farhadzadeh, Amir Said, Guillaume Sautiere, Hoang Le | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders | 使用暹罗裁剪蒙版自动编码器进行高效图像预训练 | Alexandre Eymaël, Renaud Vandeghen, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | MUTE-SLAM: Real-Time Neural SLAM with Multiple Tri-Plane Hash Representations | MUTE-SLAM:具有多个三平面哈希表示的实时神经 SLAM | Yifan Yan, Ruomin He, Zhenghua Liu | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Boosting Adversarial Training via Fisher-Rao Norm-based Regularization | 通过基于 Fisher-Rao 范数的正则化促进对抗训练 | Xiangyu Yin, Wenjie Ruan | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | 分担成功成本:评估和学习协作多智能体指令给予和遵循策略的游戏 | Philipp Sadler, Sherzod Hakimov, David Schlangen | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge | 学习在没有先验声源知识的情况下从混合物中视觉定位声源 | Dongjin Kim, Sung Jin Um, Sangmin Lee, Jung Uk Kim | arxiv.org/pdf/2403.17… | null |
| 2024-03-26 | Labeling subtypes in a Parkinson's Cohort using Multifeatures in MRI - Integrating Grey and White Matter Information | 使用 MRI 中的多特征标记帕金森病队列中的亚型 - 整合灰质和白质信息 | Tanmayee Samantaray, Jitender Saini, Pramod Kumar Pal, Bithiah Grace Jaganathan, Vijaya V Saradhi, Gupta CN | arxiv.org/pdf/2403.17… | null |