Welcome to AI Today TechTalk – where we geek out about the coolest, craziest, and most mind-blowing stuff happening in the world of Artificial Intelligence! 🚀
This is your AI crash course, snackable podcast-style. Think of it as your weekly dose of cutting-edge research, jaw-dropping breakthroughs, and “Wait, AI can do THAT?!” moments. We take the techy, brain-bending papers and news, break them down, and serve them up with a side of humor and a whole lot of fun.
Whether you’re an AI superfan, a tech wizard, or just someone who loves knowing what’s next in the tech world, this channel has s
📻 Siste episoder av AI Today
Her er de nyeste episodene tilgjengelige via RSS-feeden:
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google (00:16:23)
Paper: https://arxiv.org/pdf/2501.17161
This research paper compares supervised fine-tuning (SFT) and reinforcement learning (RL) for post-training foundation models. Using novel and existing tasks i...
Paper: https://github.com/deepseek-ai/Janus/blob/main/janus_pro_tech_report.pdf
Github: https://github.com/deepseek-ai/Janus/tree/main?tab=readme-ov-file
The paper introduces Janus-Pro, an improved m...
Memory Layers at Scale | #ai #2024 #genai #meta (00:14:59)
Paper: https://arxiv.org/pdf/2412.09764
This research paper explores the effectiveness of memory layers in significantly enhancing large language models (LLMs). By incorporating a trainable key-value...
Large Concept Models: Language Modeling in a Sentence Representation Space | #ai #2024 #genai (00:29:20)
Paper: https://scontent-dfw5-1.xx.fbcdn.net/...
This research paper introduces Large Concept Models (LCMs), a novel approach to language modeling that operates on sentence embeddings instead of indi...
DeepSeek v3 | #ai #2024 #genai (00:28:35)
Technical Report: https://arxiv.org/pdf/2412.19437
Github: https://github.com/deepseek-ai/DeepSe...
This research paper introduces DeepSeek-V3, a 671-billion parameter Mixture-of-Experts (MoE) large ...
VISION TRANSFORMERS NEED REGISTERS | #ai #2024 #genai #meta (00:33:17)
Paper: https://arxiv.org/pdf/2309.16588
This research paper examines artifacts in vision transformer feature maps, specifically high-norm tokens appearing in non-informative image areas. The authors ...
Byte Latent Transformer: Scaling Language Models with Patches | #ai #2024 #genai (00:21:34)
Paper: https://arxiv.org/pdf/2412.09871v1.pdf
The paper introduces the Byte Latent Transformer (BLT), a novel large language model architecture that processes raw byte data without tokenization. BLT ...
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | #ai #2024 #genai (00:20:56)
This research paper introduces CosyVoice 2, an improved streaming speech synthesis model. Building upon its predecessor, CosyVoice 2 utilizes advancements in large language models (LLMs) and incorpora...
OpenAI's o3 and o3-mini: A New Frontier in AI | #ai #2024 #genai (00:22:28)
Blog: https://openai.com/12-days/
OpenAI announced two new large language models, o3 and o3-mini, showcasing significantly improved performance on various benchmarks, including coding, mathematics, ...
Alignment Faking in Large Language Models | #ai #2024 #genai (00:14:41)
Paper: https://arxiv.org/pdf/2412.14093
This research paper explores "alignment faking" in large language models (LLMs). The authors designed experiments to provoke LLMs into concealing their true pr...
Veo 2, Imagen 3, and Whisk: State-of-the-Art AI Image and Video Generation | #ai #2024 #genai (00:19:24)
Blog: https://blog.google/technology/google...
Google announced updates to its AI video and image generation models, Veo 2 and Imagen 3, boasting state-of-the-art capabilities in realism and style d...
Allegro: Open the Black Box of Commercial-Level Video Generation Model | #ai #2024 #genai (00:19:24)
Paper: https://arxiv.org/pdf/2411.01747
This research report introduces Allegro, a novel, open-source text-to-video generation model that surpasses existing open-source and many commercial models in ...
DynaSaur : Large Language Agents Beyond Predefined Actions | #ai #2024 #genai (00:19:24)
Paper: https://arxiv.org/pdf/2411.01747
The paper "DynaSaur: Large Language Agents Beyond Predefined Actions" introduces a novel large language model (LLM) agent framework that dynamically generates ...
STAR ATTENTION: EFFICIENT LLM INFERENCE OVER LONG SEQUENCES | #ai #2024 #genai (00:16:58)
Paper: https://arxiv.org/pdf/2411.17116
The paper introduces Star Attention, a novel two-phase attention mechanism for efficient Large Language Model (LLM) inference on long sequences. It improves co...
FERRET-UI 2: MASTERING UNIVERSAL USER INTERFACE UNDERSTANDING ACROSS PLATFORMS | #ai #2024 #genai (00:14:56)
Paper: https://arxiv.org/pdf/2410.18967
The paper introduces Ferret-UI 2, a multimodal large language model (MLLM) that significantly improves upon its predecessor, Ferret-UI, by enabling universal u...
Adapting While Learning: Grounding LLMs for Scientific Problems I-Tool Usage Adaptation | #ai #2024 (00:14:55)
Paper: https://arxiv.org/abs/2411.00412
This research introduces a novel two-stage training method to improve Large Language Models' (LLMs) ability to solve complex scientific problems. The method, c...
Paper: https://arxiv.org/pdf/2411.02830
This research introduces Mixtures of In-Context Learners (MOICL), a novel approach to improve in-context learning (ICL) in large language models (LLMs). MOICL ...
Paper: https://arxiv.org/pdf/2411.04997
Github: https://github.com/microsoft/LLM2CLIP
The paper introduces LLM2CLIP, a method to improve the visual representation learning capabilities of CLIP by int...
Paper: https://arxiv.org/pdf/2411.14199
Github: https://github.com/AkariAsai/OpenScholar
The research introduces OpenScholar, a retrieval-augmented large language model (LLM) designed for synthesizin...
Paper: https://arxiv.org/pdf/2401.03407
Github: https://github.com/ZhengPeng7/BiRefNet
This research introduces BiRefNet, a novel deep learning framework for high-resolution dichotomous image segment...
LLaVA-o1: Let Vision Language Models Reason Step-by-Step | #ai #genai #lvm #llm #mmm #cv #2024 (00:14:56)
Paper: https://arxiv.org/pdf/2411.10440
Github: https://github.com/PKU-YuanGroup/LLaV...
The paper introduces LLaVA-o1, a vision-language model designed for improved multi-stage reasoning. Unlike pre...
Model-Based Transfer Learning for Contextual Reinforcement Learning | #ai #mit #rl #genai #ml #2024 (00:14:56)
Paper: https://arxiv.org/pdf/2408.04498
This research introduces Model-Based Transfer Learning (MBTL), a novel framework for improving the efficiency and robustness of deep reinforcement learning (RL...
Diverse and Effective Red Teaming Auto-gen Rewards & Multi-step RL | #aisafety #openai #genai #2024 (00:14:56)
Paper: https://cdn.openai.com/papers/diverse...
Blog: https://openai.com/index/advancing-re...
This OpenAI research paper presents novel methods for automated red teaming of large language models (LL...
OpenAI’s Approach to External Red Teaming for AI Models and System | #aisafety #openai #genai #2024 (00:14:56)
Paper: https://cdn.openai.com/papers/openais...
Blog: https://openai.com/index/advancing-re...
This white paper details OpenAI's approach to external red teaming for AI models and systems. External r...
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with MotionAware Mem | #2024 (00:14:56)
Paper: https://arxiv.org/pdf/2411.11922
Github: https://github.com/yangchris11/samurai
Blog: https://yangchris11.github.io/samurai/
The paper introduces SAMURAI, a novel visual object tracking method...