<!DOCTYPE html><html lang="en"><head><meta http-equiv="Content-Type" content="text/html charset=UTF-8"><meta charset="UTF-8"><meta name="viewport" content="width=device-width"><meta name="x-apple-disable-message-reformatting"><title>TLDR Data</title><meta name="color-scheme" content="light dark"><meta name="supported-color-schemes" content="light dark"><style type="text/css"> :root { color-scheme: light dark; supported-color-schemes: light dark; } *, *:after, *:before { -webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box; } * { -ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; } html, body, .document { width: 100% !important; height: 100% !important; margin: 0; padding: 0; } body { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; text-rendering: optimizeLegibility; } div[style*="margin: 16px 0"] { margin: 0 !important; } table, td { mso-table-lspace: 0pt; mso-table-rspace: 0pt; } table { border-spacing: 0; border-collapse: collapse; table-layout: fixed; margin: 0 auto; } img { -ms-interpolation-mode: bicubic; max-width: 100%; border: 0; } *[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; } .x-gmail-data-detectors, .x-gmail-data-detectors *, .aBn { border-bottom: 0 !important; cursor: default !important; } .btn { -webkit-transition: all 200ms ease; transition: all 200ms ease; } .btn:hover { background-color: #f67575; border-color: #f67575; } * { font-family: Arial, Helvetica, sans-serif; font-size: 18px; } @media screen and (max-width: 600px) { .container { width: 100%; margin: auto; } .stack { display: block!important; width: 100%!important; max-width: 100%!important; } .btn { display: block; width: 100%; text-align: center; } } body, p, td, tr, .body, table, h1, h2, h3, h4, h5, h6, div, span { background-color: #FEFEFE !important; color: #010101 !important; } @media (prefers-color-scheme: dark) { body, p, td, tr, .body, table, h1, h2, h3, h4, h5, h6, div, span { background-color: #27292D !important; color: #FEFEFE !important; } } a { color: inherit !important; text-decoration: underline !important; } </style><!--[if mso | ie]> <style type="text/css"> a { background-color: #FEFEFE !important; color: #010101 !important; } @media (prefers-color-scheme: dark) { a { background-color: #27292D !important; color: #FEFEFE !important; } } </style> <![endif]--></head><body class=""> <div style="display: none; max-height: 0px; overflow: hidden;">Iceberg V4 introduces a new content metadata tree that enables cheap single-file commits. Today, even a single data file write requires updating β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β </div> <div style="display: none; max-height: 0px; overflow: hidden;"> <br> </div> <table align="center" class="document"><tbody><tr><td valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" class="container" width="600"><tbody><tr class="inner-body"><td> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr class="header"><td bgcolor="" class="container"> <table width="100%"><tbody><tr><td class="container"> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" style="margin-top: 0px;" width="100%"><tbody><tr><td style="padding: 0px;"> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div style="text-align: center;"> <span style="margin-right: 0px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Ftldr.tech%2Fdata%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/Iq5CPZy_H2IA6S2oxYqZITd5ELf23DrjKUZxZHbOV84=420" rel="noopener noreferrer" target="_blank"><span>Sign Up</span></a> |<span style="margin-right: 2px; margin-left: 2px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fadvertise.tldr.tech%3Futm_source=tldrdata%26utm_medium=newsletter%26utm_campaign=advertisetopnav/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/_dwum-ZJR9RKJdwZlvK9Qvw-GBN_bKp3tiN08xtDbYY=420" rel="noopener noreferrer" target="_blank"><span>Advertise</span></a></span>|<span style="margin-left: 2px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fa.tldrnewsletter.com%2Fweb-version%3Fep=1%26lc=1670a604-84b7-11f0-bcf5-55fc1d40139c%26p=a1095c30-86e1-11f0-b178-b73fd6ae4cf4%26pt=campaign%26t=1756721151%26s=bf4342a3c5b7553444ee27641ceabc1f3f94b4e3fc43e34dbffe14814bcebd96/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/fyLGuRCpaNOpXNuI-wBhbnoPGQJfYBgk61a0_uoikJM=420"><span>View Online</span></a></span> <br> </span></div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="text-align: center;"><span data-darkreader-inline-color="" style="--darkreader-inline-color:#3db3ff; color: rgb(51, 175, 255) !important; font-size: 30px;">T</span><span style="font-size: 30px;"><span data-darkreader-inline-color="" style="color: rgb(232, 192, 96) !important; --darkreader-inline-color:#e8c163; font-size:30px;">L</span><span data-darkreader-inline-color="" style="color: rgb(101, 195, 173) !important; --darkreader-inline-color:#6ec7b2; font-size:30px;">D</span></span><span data-darkreader-inline-color="" style="--darkreader-inline-color:#dd6e6e; color: rgb(220, 107, 107) !important; font-size: 30px;">R</span> <br> </td></tr></tbody></table> <br> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr id="together-with"><td align="center" height="20" style="vertical-align:middle !important;" valign="middle" width="100%"><strong style="vertical-align:middle !important; height: 100%;">Together With </strong> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fdelve.co%2Fbook-demo%3Futm_source=tldr%26utm_medium=newsletter%26utm_campaign=data-primary-sep01-25/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/2e6iJqkcRZUDjaJ6J56Cu8mfblxivbdF-xyXOVjl5MA=420"><img src="https://images.tldr.tech/delve.png" valign="middle" style="vertical-align: middle !important; height: 100%;" alt="Delve"></a></td></tr></tbody></table> <table style="table-layout: fixed; width:100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;"> <div style="text-align: center;"> <h1><strong>TLDR Data <span id="date">2025-09-01</span></strong></h1> </div> </td></tr></tbody></table> <table style="table-layout: fixed; width:100%;" width="100%"><tbody><tr id="sponsy-copy"><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fdelve.co%2Fbook-demo%3Futm_source=tldr%26utm_medium=newsletter%26utm_campaign=data-primary-sep01-25/2/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/HlLAQM34f101PTGpIgrNZbUkjt_u_MOR0YaM59Uj7-s=420"> <span> <strong>Today, compliance is your shortest relationship (Sponsor)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> We meet. We click. In 15 hours, you're SOC 2 compliant.<p></p><p>No ghosting, no red flagsβjust AI agents doing all the grunt work while you close deals.</p><p>Our happily-ever-afters:</p><ul><li>Lovable β SOC 2 in 20 hours.</li><li>Bland β $500K ARR in 7 days.</li><li>11x β $1.2M ARR unlocked.</li></ul><p>Fast, easy, no drama. It's not just doneβit's done in <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fdelve.co%2Fblog%2Fseries-a%3Futm_source=tldr%26utm_medium=newsletter%26utm_campaign=data-primary-sep01-25/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/TitR0tkDF7qGJ7XhDSPNJON9Imv4VsbDgIVxB4jkwiI=420" rel="noopener noreferrer nofollow" target="_blank"><span>Delve</span></a>.</p> <p>Let's make it official. <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fdelve.co%2Fbook-demo%3Futm_source=tldr%26utm_medium=newsletter%26utm_campaign=data-primary-sep01-25/3/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/F7U0BQYklUTvDGp9LwldWL2sjMcoAAqAq99i-_j73nA=420" rel="noopener noreferrer nofollow" target="_blank"><span>Book your demo now</span></a> and get 1k off with code TLDR1KOFF. </p> </span></span></div> </td></tr></tbody></table> </td></tr></tbody></table> </td></tr></tbody></table> </td></tr> <tr bgcolor=""><td class="container"> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td style="padding: 0px;"> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><span style="font-size: 36px;">π±</span></div></div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"> <h1><strong>Deep Dives</strong></h1> </div> </div> </td></tr></tbody></table> <table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.youtube.com%2Fwatch%3Fv=uWm-p--8oVQ%26utm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/Js91o4UNZq2OVkwT-5nkdj9Fna7Ss2Kj5WTUaT4zN8M=420"> <span> <strong>Iceberg V4 Single File Commits (57 minute video)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> Iceberg V4 introduces a new content metadata tree that enables cheap single-file commits. Today, even a single data file write requires updating three metadata files (manifest, manifest list, and metadata JSON), causing high write amplification, especially impacting the performance of small deletes. Iceberg V4 solved this problem by introducing a root manifest per snapshot that can directly hold files/manifests and inline manifest delete vectors to mark removals without rewriting manifests. Root-level aggregated column stats improve pruning, and optional affinity links deletions to data manifests for faster planning. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.nb-data.com%2Fp%2F23-rag-pitfalls-and-how-to-fix-them%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/NCx9mBdeODzfMdHj_ATrCdSKSmPu_WoZbEgD3HE2SWg=420"> <span> <strong>23 RAG Pitfalls and How to Fix Them (14 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> RAG often fails from issues across five layers: Data (bad chunking, stale or low-quality corpora, wrong or outdated embeddings, and ignored metadata), Retrieval (single method, bad Top-K, weak reranking, and context overflow/conflicts), Prompt (vague prompts, ambiguity, and multi-part questions), System (latency and poor scalability), and Safety/Trust (no guardrails or monitoring and bogus citations). Common remedies include hybrid retrieval (BM25+vector) with metadata filters, periodic re-embedding, cross-encoder rerankers, dynamic top-K, prompt constraints, query decomposition, caching/ANN, distributed stores, and continuous human feedback. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Ftech.instacart.com%2Fsimplifying-large-scale-llm-processing-across-instacart-with-maple-63df4508d5be%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/Wvw0q_jOjcD8gw5aPMFResxIYNR4Z16Z-dx_MTb6IRs=420"> <span> <strong>Simplifying Large-Scale LLM Processing across Instacart with Maple (9 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> Instacart's Maple service streamlines large-scale LLM processing by automating batch workflows, reducing costs by up to 50%. Built with fault-tolerant tools like Temporal and efficient storage in S3 using Parquet files, Maple abstracts complexities, supports both batch and real-time LLM providers, and empowers teams to innovate quickly without custom infrastructure. </span> </span> </div> </td></tr></tbody></table> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><span style="font-size: 36px;">π</span></div> </div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"> <h1><strong>Opinions & Advice</strong></h1> </div> </div> </td></tr></tbody></table> <table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fseattledataguy.substack.com%2Fp%2Fwhat-separates-good-from-great-data%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/PMQfhwBYFnfdmC3kQqKswZKaaTnq0juzGOkLhYDEMYk=420"> <span> <strong>What Separates Good From Great Data Teams (6 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> High-impact data teams distinguish themselves not by tool choice but by rigorously defining business problems, aligning solutions to stakeholder priorities, and maintaining ownership of outcomes beyond ticket closure. Quantitative results follow when teams emphasize clear communication of trade-offs, such as speed versus flexibility, over technical minutiae. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.montecarlodata.com%2Fblog-the-new-dna-of-the-data-ai-team%2F%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/Mx-ohzGD6u2LTOwTHEiVVOwVuUGNJvz-3vrc3IAeGqg=420"> <span> <strong>The New DNA of the Data + AI Team (6 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> The evolving data + AI team requires unstructured data fluency, agent architecture expertise, retrieval and prompt engineering, robust evaluations, and end-to-end observability to build trustworthy AI agents. Organizations are adapting by embedding AI experts in product teams, partnering dedicated AI teams with business domains, or using AI to enhance internal data processes while transforming infrastructure to support scalable, AI-ready platforms. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fjessicatalisman.substack.com%2Fp%2Fmetadata-as-a-data-model%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/EzXlY-S5IkC9PAknaEVglySqD8XYR1BN0-eb7plL8FI=420"> <span> <strong>Metadata as a Data Model (8 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> Library science treats metadata as a holistic, interoperable data model using standards like MARC, FRBR, RDA, and BIBFRAME to enhance semantics and findability. Enterprise metadata, often fragmented and non-interoperable, could improve AI-driven data systems by adopting library-inspired frameworks. </span> </span> </div> </td></tr></tbody></table> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><span style="font-size: 36px;">π»</span></div> </div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"> <h1><strong>Launches & Tools</strong></h1> </div> </div> </td></tr></tbody></table> <table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="http://tracking.tldrnewsletter.com/CL0/http:%2F%2Fconfessionsofadataguy.com%2Fthe-fastest-way-to-insert-data-to-postgres%2F%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/Q-JrQmYDPwAMfrSwpPEw0t-OKkAa4s7NPI2hAiy922U=420"> <span> <strong>The Fastest Way to Insert Data to Postgres (5 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> The fastest way to insert large datasets into PostgreSQL is by using the COPY command, which significantly outperforms traditional INSERT statements by minimizing per-row overhead, achieving a 22-million-row insertion in under 14 minutes when paired with Spark's parallel processing and psycopg connector. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fgithub.com%2Fkruskal-labs%2Ftoolfront%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/lr-s33szFzsSOTYHSzyDTbnaLmvmcAGNOY3W_i-z94Y=420"> <span> <strong>Toolfront (GitHub Repo)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> ToolFront provides agents with two read-only database tools to explore data and answer questions quickly, supporting over 15 databases, data files, and APIs with zero configuration and predictable, structured results. It can be used directly, as an MCP server, or customized for any AI framework, simplifying development for database and API-driven AI agents. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.cybertec-postgresql.com%2Fen%2Ftesting-rules%2F%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/j5FlBzavW0iyjXS5IC5AhfIKt9kE1bDUNTO9bPj9Y24=420"> <span> <strong>Testing Rules (5 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> PostgreSQL's autovacuum doesn't collect statistics for partitioned tables, which can lead to inaccurate query plans because partitioned tables store no data themselves, like a hash join with a 5,000x row count error. Manually running ANALYZE on partitioned tables, scheduled daily or weekly, ensures accurate statistics and optimal query performance. </span> </span> </div> </td></tr></tbody></table> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><span style="font-size: 36px;">π</span></div></div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><strong><h1>Miscellaneous</h1></strong></div> </div> </td></tr></tbody></table> <table bgcolor="" style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.youtube.com%2Fwatch%3Fv=GfH4QL4VqJ0%26utm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/I-SHWcZJANafgJdy2U-lZKvYTzEvRuc1OTcVYAR4HRE=420"> <span> <strong>Python: The Documentary | An origin story (84 minute video)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> This video tells the story of Python's journey from a small side project in Amsterdam to the backbone of AI, data science, and global tech companies through the voices of its creators and community. The film highlights its open-source roots, community governance, key conflicts (like Python 3), and how it became the language of choice for engineers and data functions worldwide. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fblog.cloudflare.com%2Fcrawlers-click-ai-bots-training%2F%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/GdACVvxG_GEsBeptYpmc0ag4obGVwYEIOa8zOEe2APs=420"> <span> <strong>The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals (10 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> AI-driven web crawling surged 32% YoY in April, with training purposes now accounting for 80% of AI bot activity, up from 72% last year. Major AI crawlers like GPTBot (up to 28.1% share) and Meta (up to 17.7%) overtook Amazonbot and Bytespider, while news site referrals from Google dropped 9-15% since January, driven by AI-generated search summaries. Massive crawl-to-refer imbalances persist (e.g., in July, Anthropic's Claude crawled 38,000 pages on average for one user visit), signaling content extraction without proportional publisher traffic or monetization. Bot verification via WebBotAuth remains rare, raising compliance and spoofing risks for data platform owners. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fbenn.substack.com%2Fp%2Fthe-context-layer%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/h0QP1SoRlONEvYWmJjL_bClhy7w8V1lPbNmwZfZGltM=420"> <span> <strong>The context layer (9 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> Fragmented metric definitions in the modern data stack have led to inconsistent analytics and business confusion, fueling interest in a centralized metrics layer. Despite clear technical value, such layers have struggled for adoption due to challenging economics: vendors found standalone layers difficult to monetize without owning the BI/user experience. The rise of AI agents requiring curated, governed business context echoes these challenges, suggesting central semantic repositories remain compelling in theory, but hard to commercialize. </span> </span> </div> </td></tr></tbody></table> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"><span style="font-size: 36px;">β‘</span></div></div> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;"> <div class="text-block"> <div style="text-align: center;"> <h1><strong>Quick Links</strong></h1> </div> </div> </td></tr></tbody></table> <table bgcolor="" style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top"> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fcloud.google.com%2Fblog%2Fproducts%2Fdata-analytics%2Fcommitting-to-apache-iceberg-with-our-ecosystem-partners%2F%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/e_mzYyswVqQyLcDHzAg8sAHwcRy48X3-EDjVXSZa5w8=420"> <span> <strong>Google Cloud's open ecosystem for Apache Iceberg (5 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> Google Cloud now offers enterprise-grade Iceberg integration via BigLake tables, GCS, and REST Catalog API, strengthening Apache Iceberg's position as the leading cross-platform standard for scalable, interoperable data lakehouses. </span> </span> </div> </td></tr></tbody></table> <table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block"> <span> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.velodb.io%2Fblog%2F1463%3Futm_source=tldrdata/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/MnD3wuMeSjuEn1j_2d_9NwLWvBN6nRRAUObAqMjAWzI=420"> <span> <strong>The Ultimate OLAP Showdown: Apache Doris vs. ClickHouse vs. Snowflake (7 minute read)</strong> </span> </a> <br> <br> <span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;"> VeloDB's benchmark claims Apache Doris delivers 2.5-14x faster queries and up to 50x better price-performance than ClickHouse and Snowflake using join-heavy benchmark scenarios like a popular simulated coffee shop dataset, TPC-H, and TPC-DS. </span> </span> </div> </td></tr></tbody></table> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td align="left" style="word-break: break-word; vertical-align: top; padding: 5px 10px;"> <p style="padding: 0; margin: 0; font-size: 22px; color: #000000; line-height: 1.6; font-weight: bold;"> Want to advertise in TLDR? π° </p> <div class="text-block" style="margin-top: 10px;"> If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fadvertise.tldr.tech%2F%3Futm_source=tldrdata%26utm_medium=newsletter%26utm_campaign=advertisecta/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/jAA_6wp7pVNeGYwwxEfss5gtKYsvPBqdY53OWJ1m4_0=420"><strong><span>advertise with us</span></strong></a>. </div> <br> <!-- New "Want to work at TLDR?" section --> <p style="padding: 0; margin: 0; font-size: 22px; color: #000000; line-height: 1.6; font-weight: bold;"> Want to work at TLDR? πΌ </p> <div class="text-block" style="margin-top: 10px;"> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fjobs.ashbyhq.com%2Ftldr.tech/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/HIMkiRyHCVbgCZjJvM6W1lQ5RwqwhZlXd9DNHTALzrw=420" rel="noopener noreferrer" style="color: #0000EE; text-decoration: underline;" target="_blank"><strong>Apply here</strong></a> or send a friend's resume to <a href="mailto:jobs@tldr.tech" style="color: #0000EE; text-decoration: underline;">jobs@tldr.tech</a> and get $1k if we hire them! </div> <br> <div class="text-block"> If you have any comments or feedback, just respond to this email! <br> <br> Thanks for reading, <br> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fjoelvanveluwen%2F/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/r1LqSe0YRoaoW7j1CYu8bgzRLt3Oz8xEGVi_jUTXrhU=420"><span>Joel Van Veluwen</span></a>, <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fjennytzurueyching%2F/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/drKcEwN90X52DTCSN6jVIiFuavNHkB49_1o8JFkfpwk=420"><span>Tzu-Ruey Ching</span></a> & <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fremi-turpaud%2F/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/GcoxI8JG8ePe1CpMkR4Z-D16bQ9VP9OitF5iYPeW_lI=420"><span>Remi Turpaud</span></a> <br> <br> </div> <br> </td></tr></tbody></table> <table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;"> <div class="text-block" id="testing-id"> <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Ftldr.tech%2Fdata%2Fmanage%3Femail=silk.theater.56%2540fwdnl.com/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/YxPWLlkydVf1kInh_LGEED9PPIeetGpDSSysAIjvUpg=420">Manage your subscriptions</a> to our other newsletters on tech, startups, and programming. Or if TLDR Data isn't for you, please <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fa.tldrnewsletter.com%2Funsubscribe%3Fep=1%26l=037ede50-92cc-11ee-b0f2-b761aa2217ad%26lc=1670a604-84b7-11f0-bcf5-55fc1d40139c%26p=a1095c30-86e1-11f0-b178-b73fd6ae4cf4%26pt=campaign%26pv=4%26spa=1756720867%26t=1756721151%26s=1ef71bd137654bc107ffe042fda79434d0dd9c4119c659f639af20cabdbb4a05/1/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/ztC4EhA5_uniXx0_uyo3EXDf6FbgcLo4R2wfwzM64Xk=420">unsubscribe</a>. <br> </div> </td></tr></tbody></table> </td></tr></tbody></table> </td></tr></tbody></table> </td></tr></tbody></table> </td></tr></tbody></table> <img alt="" src="http://tracking.tldrnewsletter.com/CI0/0100019904bd7d38-d614144f-5627-48e2-a836-134b076a1424-000000/hJ8cvoLDKVsETXWb-gFPyekI7HidBnP5rS887Qhldmw=420" style="display: none; width: 1px; height: 1px;"> </body></html>