Writing <a class="tweet-url username" href="https://twitter.com/Pragmatic_Eng" data-screen-name="Pragmatic_Eng" target="_blank" rel="nofollow">@Pragmatic_Eng</a>, the #1 software engineering newsletter on Substack. Author of <a class="tweet-url username" href="https://twitter.com/EngGuidebook" data-screen-name="EngGuidebook" target="_blank" rel="nofollow">@EngGuidebook</a>. Formerly Uber & Skype.

Gergely Orosz

Entity Service top level domain, every item that is in Entity Service should be in this domain

Entities [Entity Service]

Services

Categories within Brand Verticals that narrow down the scope of Brands

Business Taxonomy

Brands, companies, advertisers and every non-person handle with the profit intent related to softwares, apps, communication equipments, hardwares

Technology Business

Brand

Google 

Reddit

OpenAI

Products created by Brands.  Examples: Ford Explorer, Apple iPhone.

Product

Google - AI

Google brand conversation

Google Innovation

Unified Twitter Taxonomy

Artificial intelligence

for individual and types of technology, e.g., food technology, 3D printing

Technology

Social media

We know that when LLM tools are trained on LLM-generated output: they regress. It’s partially why companies like Google and OpenAI are licensing Reddit data to train their models. As Reddit is assumed to be human content.

Well, it was. More and more of it will be subtle AI spam. https://t.co/HjHSettWQj

i built an ai agent that does marketing for me on autopilot! 🤯

it searches reddit for relevant posts, provides a valuable response to the users &amp; promotes my product in a subtle and natural way.

i'm using claude 3.5 sonnet, it's so good i can't actually believe it haha https://t.co/u0xa5qQ46s

helping founders build AI apps quickly with https://t.co/OP9dQFYApV 

// s5 @_buildspace

Fekri

ChatGPT

Which is yet another reason it could be why LLM evolution stopped at ChatGPT-4 level. It’s 16 months later that no new models have made the kind of jump we’ve seen from ChatGPT 3.5 to 4.0 (in just 6 months.)

When your training data is increasingly AI generated, it’s hard!!

Writing @Pragmatic_Eng, the #1 technology newsletter on Substack. Author of @EngGuidebook. Formerly Uber & Skype.

Named people in the world like Nelson Mandela

Person

TOMORROW X TOGETHER

Brands, companies, advertisers and every non-person handle with the profit intent related to movies, music, television, franchises, venues, theme parks, toys, tourism, hotels

Entertainment & Leisure Business

A musician in the world, like Adele or Bob Dylan

Musician

A category for a musical style, like Pop, Rock, or Rap

Music Genre

K-pop

Music

Also, there's this dilemma on how LLM tools do not respect robots.txt. They ingest every website, even if that site does NOT want to lend its content as free training material.

These sites generating heaps of LLM-generated garbage as some of their webpages could be a response.

from Gergely Orosz | by Gergely Orosz