🗣️ ChatGPT's Voice Mode goes live!

Also: Meta’s Advanced AI for Video and Image Segmentation

In partnership with

Welcome, AI enthusiasts

It’s shaping up to be an exciting week in the world of AI, and we’re only halfway through! OpenAI has just released an alpha version of ChatGPT's Advanced Voice Mode, starting with ChatGPT Plus subscribers. Meta is also making waves with SAM 2, an enhanced version of its Segment Anything Model, now offering seamless object segmentation in both videos and images. Meanwhile, Midjourney has rolled out version 6.1, boasting significant upgrades like improved image quality and faster processing times. Let’s dive in!

In today’s insights:

  • ChatGPT's Voice Mode Goes Live for Some Users

  • Meta’s Advanced AI for Video and Image Segmentation

  • Midjourney V6.1: Enhanced AI Image Generation

Read time: 4 minutes

🗞️ LATEST DEVELOPMENTS

Evolving AI: OpenAI has started testing its "Advanced Voice Mode" with a select group of ChatGPT Plus subscribers.

Key Points:

  • The new feature is rolling out gradually, initially to ChatGPT Plus users, with broader availability planned for fall 2024.

  • GPT-4o's voice capabilities include emotional intonations and lower latency.

  • OpenAI has implemented strict controls to prevent misuse, including blocking unauthorized voice impersonations.

Details:

OpenAI starts rolling out its Her-like voice mode for ChatGPT. Unlike previous versions, it integrates all audio processing tasks into one model, enhancing response times and emotional sensitivity. The launch follows a cautious approach due to potential ethical concerns, particularly around impersonation and copyright issues. OpenAI tested the voice capabilities of GPT-4o, the model powering Advanced Voice Mode, with over 100 external red teams across 45 languages. To protect user privacy, the model speaks only in four preset voices. The company has also implemented systems to prevent output that deviates from these voices.

Why It Matters:

The long-awaited Advanced Voice Mode is finally available, and we are likely to see many companies follow suit by releasing their own voice modes to provide people with AI assistants. Both Amazon and Apple have announced plans to update their voice technology to remain competitive.

Learn AI-led Business & startup strategies, tools, & hacks worth a Million Dollars (free AI Masterclass) 🚀

This incredible 3-hour Crash Course on AI & ChatGPT (worth $399) designed for founders & entrepreneurs will help you 10x your business, revenue, team management & more.

It has been taken by 1 Million+ founders & entrepreneurs across the globe, who have been able to:

  • Automate 50% of their workflow & scale your business

  • Make quick & smarter decisions for their company using AI-led data insights

  • Write emails, content & more in seconds using AI

  • Solve complex problems, research 10x faster & save 16 hours every week

Source: AI at Meta

Evolving AI: Meta introduces a new AI model called Segment Anything Model 2, or SAM 2, which it says can tell which pixels belong to a certain object in videos.

Key Points:

  • SAM 2 integrates image and video segmentation, excelling in accuracy and real-time performance.

  • Released under Apache 2.0, promoting innovation in various domains.

  • The new SA-V dataset includes 51,000 videos.

Details:

Using SAM 2, video editors could isolate and manipulate objects within a scene more easily than the limited abilities of current editing software and far beyond manually adjusting each frame. Meta envisions SAM 2 revolutionizing interactive video, too. Users could select and manipulate objects within live videos or virtual spaces thanks to the AI model.

Meta thinks SAM 2 could also play a crucial role in the development and training of computer vision systems, particularly in autonomous vehicles. Accurate and efficient object tracking is essential for these systems to interpret and navigate their environments safely. SAM 2’s capabilities could expedite the annotation process of visual data, providing high-quality training data for these AI systems.

Why It Matters:

The researchers see SAM 2 as an important advance in machine vision that could drive further research and applications, such as robots that can move and interact with the real world more reliably, or enabling video effects in generative AI video models. Meta releases the SAM 2 model, code, and weights as open source under the Apache 2.0 license.

Evolving AI: Midjourney V6.1 is live with upgrades including better image quality, faster processing, and new personalization features.

Key Points:

  • Improved coherence in images for complex subjects.

  • Enhanced image quality and detailed small features.

  • New upscaling options for better texture and resolution.

Details:

V6.1 introduces more coherent and detailed images, reducing pixel artifacts and enhancing textures. The model excels in rendering small features like eyes and hands; new upscalers boost image quality. Processing speed is 25% faster. Improved text accuracy and a new personalization model add nuance. However, inpainting/outpainting remains the same as V6.0.

Why It Matters:

Midjourney took 7 months to release V6.1, a new version following the V6 launch. Overall, it is a notable improvement to Midjourney, offering subtle but significant changes to areas where the base model struggled. It is also a promising sign of what is to come in V7.

💡 Tip of the Day

In the interview below, former Google CEO Eric Schmidt talks about where AI is headed and how to cope with China.

🎯SNAPSHOTS

Direct links to relevant AI articles.

🎨 Shutterstock: New Generative 3D allows users to create 3D objects and 360-degree backdrops.

🤝 Canva: Canva acquires Leonardo AI image startup.

📈Trending AI Tools

  • 💹 Profit Leap - A tool for business intelligence and strategic insights (link)

  • 🚀 Taped - AI-powered workflow management (link)

  • 📈 Persuva - Use AI to scale your dropshipping store (link)

  • 🎥 Depthify.ai - A tool to convert videos into 3D spatial videos (link)

  • 📹 Loom - Record better and smoother videos (link)

  • 💡 Inline Help - A tool for contextual support and knowledge directly within a website or app (link)

Reply

or to participate.