
Monetize Telegram Mini App with Telega.io
Connect your app, set CPM, and watch your revenue grow!
Start monetizing
28.4

Advertising on the Telegram channel «Data Science | Machine Learning for Researchers»
4.6
33
Computer science
Language:
English
1.4K
3
Share
Add to favorite
Buy advertising in this channel
Placement Format:
keyboard_arrow_down
- 1/24
- 2/48
- 3/72
- Native
- 7 days
- Forwards
1 hour in the top / 24 hours in the feed
Quantity
%keyboard_arrow_down
- 1
- 2
- 3
- 4
- 5
- 8
- 10
- 15
Advertising publication cost
local_activity
$6.00$6.00local_mall
0.0%
Remaining at this price:0
Recent Channel Posts
Article Title:
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Article Date: 24 May 2025
Article Description:
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18445v1.pdf
GitHub:
• https://github.com/showlab/omniconsistency
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.me/DataScienceT
125
13:58
06.06.2025
Article Title:
ImgEdit: A Unified Image Editing Dataset and Benchmark
Article Date: 26 May 2025
Article Description:
Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.20275v1.pdf
GitHub:
• https://github.com/pku-yuangroup/imgedit
Datasets:
• MagicBrush
==================================
For more data science resources:
✓ https://t.me/DataScienceT
446
04:23
06.06.2025
Article Title:
MTGS: Multi-Traversal Gaussian Splatting
Article Date: 16 Mar 2025
Article Description:
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines. The code and data would be available to the public.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.12552v3.pdf
GitHub:
• https://github.com/OpenDriveLab/MTGS
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.me/DataScienceT
645
14:14
05.06.2025
Article Title:
Harnessing the Universal Geometry of Embeddings
Article Date: 18 May 2025
Article Description:
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.12540v2.pdf
GitHub:
• https://github.com/rjha18/vec2vec
• https://github.com/zhaoolee/garss
Datasets:
• Natural Questions
==================================
For more data science resources:
✓ https://t.me/DataScienceT
701
05:33
05.06.2025
Article Title:
Vision as LoRA
Article Date: 26 Mar 2025
Article Description:
We introduce Vision as LoRA (VoRA), a novel paradigm for transforming an LLM into an MLLM. Unlike prevalent MLLM architectures that rely on external vision modules for vision encoding, VoRA internalizes visual capabilities by integrating vision-specific LoRA layers directly into the LLM. This design allows the added parameters to be seamlessly merged into the LLM during inference, eliminating structural complexity and minimizing computational overhead. Moreover, inheriting the LLM's ability of handling flexible context, VoRA can process inputs at arbitrary resolutions. To further strengthen VoRA's visual capabilities, we introduce a block-wise distillation method that transfers visual priors from a pre-trained ViT into the LoRA layers, effectively accelerating training by injecting visual knowledge. Additionally, we apply bi-directional attention masks to better capture the context information of an image. We successfully demonstrate that with additional pre-training data, VoRA can perform comparably with conventional encode-based MLLMs. All training data, codes, and model weights will be released at https://github.com/Hon-Wong/VoRA.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.20680v1.pdf
GitHub:
• https://github.com/hon-wong/vora
Datasets:
• MM-Vet
• Google Landmarks Dataset v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
824
13:39
04.06.2025
Update Telegram, now job seekers can advertise their expertise and job opportunities without having to go back to the channel owner
792
07:30
04.06.2025
Article Title:
s3: You Don't Need That Much Data to Train a Search Agent via RL
Article Date: 20 May 2025
Article Description:
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.14146v1.pdf
GitHub:
• https://github.com/pat-jj/s3
Datasets:
• Natural Questions
• TriviaQA
• HotpotQA
• MedQA
• PubMedQA
==================================
For more data science resources:
✓ https://t.me/DataScienceT
825
06:39
04.06.2025
Article Title:
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Article Date: 27 Apr 2025
Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf
GitHub:
• https://github.com/cvs-health/uqlm
Datasets:
• GSM8K
• SVAMP
• PopQA
==================================
For more data science resources:
✓ https://t.me/DataScienceT
892
14:04
03.06.2025
Article Title:
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
Article Date: 27 Mar 2025
Article Description:
This paper presents a unified approach to understanding dynamic scenes from casual videos. Large pretrained vision foundation models, such as vision-language, video depth prediction, motion tracking, and segmentation models, offer promising capabilities. However, training a single model for comprehensive 4D understanding remains challenging. We introduce Uni4D, a multi-stage optimization framework that harnesses multiple pretrained models to advance dynamic 3D modeling, including static/dynamic reconstruction, camera pose estimation, and dense 3D motion tracking. Our results show state-of-the-art performance in dynamic 4D modeling with superior visual quality. Notably, Uni4D requires no retraining or fine-tuning, highlighting the effectiveness of repurposing visual foundation models for 4D understanding.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.21761v1.pdf
GitHub:
• https://github.com/Davidyao99/uni4d
Datasets:
• KITTI
• DAVIS
• TUM RGB-D
• MPI Sintel
• Bonn RGB-D Dynamic
==================================
For more data science resources:
✓ https://t.me/DataScienceT
888
04:17
03.06.2025
Article Title:
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Article Date: 6 May 2025
Article Description:
With the growing requirement for natural human-computer interaction, speech-based systems receive increasing attention as speech is one of the most common forms of daily communication. However, the existing speech models still experience high latency when generating the first audio token during streaming, which poses a significant bottleneck for deployment. To address this issue, we propose VITA-Audio, an end-to-end large speech model with fast audio-text token generation. Specifically, we introduce a lightweight Multiple Cross-modal Token Prediction (MCTP) module that efficiently generates multiple audio tokens within a single model forward pass, which not only accelerates the inference but also significantly reduces the latency for generating the first audio in streaming scenarios. In addition, a four-stage progressive training strategy is explored to achieve model acceleration with minimal loss of speech quality. To our knowledge, VITA-Audio is the first multi-modal large language model capable of generating audio output during the first forward pass, enabling real-time conversational capabilities with minimal latency. VITA-Audio is fully reproducible and is trained on open-source data only. Experimental results demonstrate that our model achieves an inference speedup of 3~5x at the 7B parameter scale, but also significantly outperforms open-source models of similar model size on multiple benchmarks for automatic speech recognition (ASR), text-to-speech (TTS), and spoken question answering (SQA) tasks.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.03739v1.pdf
GitHub:
• https://github.com/vita-mllm/vita-audio
Datasets:
• LibriSpeech
• TriviaQA
• LibriTTS
• AISHELL-1
• FLEURS
• VoxPopuli
• LIMA
• GigaSpeech
• Multilingual LibriSpeech
• AISHELL-2
• WenetSpeech
• MathInstruct
==================================
For more data science resources:
✓ https://t.me/DataScienceT
887
15:59
02.06.2025
close
Reviews channel
keyboard_arrow_down
- Added: Newest first
- Added: Oldest first
- Rating: High to low
- Rating: Low to high
4.6
2 reviews over 6 months
Excellent (50%) In the last 6 months
Very good (50%) In the last 6 months
c
**ffeenold@******.io
On the service since June 2022
13.05.202516:38
5
Everything is fine. Thank you!
Show more
New items
Channel statistics
Rating
28.4
Rating reviews
4.6
Сhannel Rating
83
Subscribers:
29.9K
APV
lock_outline
ER
2.0%
Posts per day:
2.0
CPM
lock_outlineSelected
0
channels for:$0.00
Subscribers:
0
Views:
lock_outline
Add to CartBuy for:$0.00
Комментарий