• Front Research
  • Posts
  • Weekly tech roundup: Web Data Drought Threatens AI Development, MIT Study Reveals

Weekly tech roundup: Web Data Drought Threatens AI Development, MIT Study Reveals

Most interesting tech stories of the week.

In partnership with

FROM FRONT

Hey there, and happy Friday!

I hope everyone’s had a great week. With our summer schedule in full swing, we’re sharing a weekly roundup of the coolest tech news I’ve come across. We’ll be back to our daily updates in early to mid-August. Enjoy!

As our newsletter grows, so do our advertising opportunities; if you want an ad-free reading experience, I urge you to get our LITE subscription for only 2$ per month. Test it out with a 14-day free trial.

Best regards, Kris

TOGETHER WITH*

Intercom for Startups

Join Intercom’s Early Stage Program to receive a 90% discount.

Get a direct line to your customers. Try the only complete AI-first customer service solution.

*Sponsor

Go Premium

Want an ad-free reading experience and support us to keep the newsletter going?

Become a LITE member for as low as $2 per month. Try it for free for 14 days; no CC is required.

TECH BRIEF

Emerging data restrictions pose serious challenges to AI companies, as an MIT-led study found a dramatic decline in available content for AI training. Publishers and online platforms are preventing their data from being harvested, with 5% of all data and 25% from high-quality sources becoming restricted. This significantly affects smaller AI companies and academic researchers who rely on these public datasets. Meanwhile, major tech firms are seeking alternative data sources, including synthetic data generated by AI systems and licensing deals with publishers.

Google has retracted its plan to abolish cookies from its Chrome browser amid opposition from industry players and regulators. Ad-tech companies accused Google of strengthening its monopoly, making ad tracking more challenging. Now, Google will prompt users to enable or disable cookies, benefiting advertisers and maintaining the current user experience. Despite this decision, proponents of increased online privacy continue to apply pressure. The move influenced ad tech stocks, with some experiencing modest gains.

In the first half of 2024, generative AI startups, specializing in AI-powered product creation, raised a staggering $12.3 billion from VCs across 225 deals. Early-stage startups, including Elon Musk’s xAI and China’s Moonshot AI, attracted the most funding. However, these companies face potential copyright issues, high training data costs, and profitability challenges. Some investors express concern that without overcoming these obstacles, the generative AI investment bubble could burst.

In a move igniting discussions around fair access to data, Reddit has started blocking its results from appearing on all search engines except Google. Stemming from a $60 million deal earlier this year, which included using Reddit data to train Google’s AI models, the decision has left non-Google search engines—like Bing, DuckDuckGo, Mojeek, and Qwant—unable to retrieve recent or complete Reddit data. Kagi, a paid engine, retains access only through buying part of its search index from Google.

A security flaw detected in leading AI language models like GPT-4o permits malicious queries rewritten in past tense to bypass safeguards, according to researchers at École polytechnique fédérale de Lausanne (EPFL). With 20 past-tense reformulation attempts, the success rate rose to 88% from 1% for breach attempts. This vulnerability could weaken existing security measures and raises concerns over the unpredictability of this technology.

OpenAI's GPT-4o mini flaunts impressive capabilities beyond previous models at a heavily discounted price. The model supports 128k input tokens (both images and text) and 16k output tokens, outperforming other rivals including Claude 3 Haiku and Gemini 1.5 Flash. Furthermore, GPT-4o mini uses the instruction hierarchy method to enhance its jailbreak, prompt injection, and system prompt extraction resistance.

Netflix, on its recent earnings call, announced plans to expand its video game catalog by almost double, currently having 80 games under development. The streaming giant plans to release one game each month, primarily focusing on 'interactive narrative games' based on Netflix's intellectual property. This decision comes as Netflix explores new content categories amid slowing subscriber growth.

TikTok is set to launch its in-app shopping platform in Spain and Ireland in October, as part of a downsized European e-commerce rollout. Further expansion is planned for 2025.

OpenAI has revealed Rules-Based Rewards (RBR), a new approach that automates AI model fine-tuning for alignment with safety policies, effectively reducing time and human subjectivity.

Sponsored

Recommended Newsletter

QUICK

Mistral Unveils AI Model with Enhanced Performance, Takes on Llama 3.1 - Mistral, a French AI startup, has unveiled Mistral Large 2, a next-generation AI model with improved multilingual and code-generation capabilities. Licensed for non-commercial research, the model competes closely with Meta's Llama 3.1 in performance.

Amazon Prime Day Propels US Online Sales to a Record $14.2 Billion - Amazon's Prime Day event led to a record $14.2 billion in US online sales, with back-to-school and tech products as main drivers. Concurrently, shoppers exhibited a trend towards more conscious, less extravagant purchasing habits.

Alphabet Earnings Satisfy, YouTube Ad Revenue Falls Short - Alphabet met Q2 earnings estimates but fell short on YouTube ad revenue. Revenue was 14% up YoY, with Google Cloud surpassing $10 billion in quarterly revenues for the first time. Alphabet plans a $5 billion multiyear investment in Waymo.

AI Startup Cohere Soars to $5.5B Valuation in Fresh Funding Round - AI firm Cohere Inc. has been valued at $5.5 billion following a $500 million Series D funding round. The company specializes in language model software for business applications.

Global Tech Crisis: Faulty CrowdStrike Update Inhibits Multiple Industries - A defective CrowdStrike update has pushed thousands of global Windows systems into a recovery boot loop, disrupting multiple sectors, including banks, airlines, and TV broadcasting.

Sponsored

Are you a Newsletter junkie? Here is a list of other newsletters you should read!

Reply

or to participate.