Undergraduate Upends a 40-Year-Old Data Science Conjecture
https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/
--
Ah, this really hits close to home. Back in a distant past, I was working on a problem that challenged the accepted wisdom of the time. When I proposed it to my ‘advisor’, it was dismissed outright because it “violated” what we thought we knew about vanishing gradients. Years later, a paper proved not only that it was possible but that it was actually quite effective.
I’ve seen the same pattern throughout my career in various companies; deeply ingrained assumptions that only get questioned during those rare, paradigm-shifting moments, like our own little 1492s. It’s fascinating how often we realize, in hindsight, how wrong we were.
That’s why this hash table breakthrough genuinely makes me happy. Here’s an undergraduate who simply… went ahead and disproved Yao’s 40-year-old conjecture about uniform hashing being optimal; probably because they didn’t know they “weren’t supposed to” question it. They just saw a problem, approached it with fresh eyes, and ended up improving probe complexity beyond what the field thought was possible.
It’s kind of vindicating, you know? It’s a powerful reminder of why we need to **always keep questioning our assumptions in computer science**; especially the ones that feel too fundamental to challenge. Sure, theoretical frameworks are essential for progress, but real breakthroughs often come from someone who hasn’t yet learned what’s considered “impossible.”
And the fact that this happened with hash tables; literally one of the first data structures we teach. This makes it even better.
It really makes me wonder what other “solved” problems in CS are just waiting for someone to revisit with a fresh perspective.
But it’s also a warning; as we lean more on machine learning and automated reasoning, are we limiting discovery by sticking too closely to established theories? Finding the balance between leveraging proven frameworks and staying open to challenging core assumptions is crucial for real progress.
Link to the paper https://arxiv.org/pdf/2501.02305
Tuesday, Feb 11, 2025You shouldn’t treat your teams like government entities.
When a team hides behind complex processes, it’s often a sign their goals aren’t aligned with the company’s objectives. The problem worsens when process-driven individuals cluster together—they become the bottleneck.
Introducing someone with strong agency into this environment leads to two possible outcomes: either they’re charismatic enough to build alliances and challenge the status quo, or they’ll create shadow workflows, trying to get things done on their own.
Neither approach is scalable. The focus shifts from accomplishing the work to simply navigating the system to make the work possible.
In the long run, this burns out everyone involved.
The goal is simple: processes should enable work, not get in the way. Align teams with company goals, give them the autonomy to act, and keep processes lean. High-agency people will drive change, not fight the system. This is how you kill bottlenecks and build a culture that gets things done.
Reviewing my notes on my thesis that never com to be, I find some pearls like:
“In any graph, the relationships between nodes are more important than the nodes themselves. This fundamental principle extends beyond graphs to any network or dataset, where the interactions between entities often reveal deeper insights than the entities alone. By analyzing how elements connect and influence one another, we can uncover patterns and dynamics that remain hidden when examining individual components in isolation.”
Only to see a red correction mark asking for a source
No #fosdem for me this year, so I spend the weekend catching up on some papers that Andy Pavlo mentioned turing a talk last year at p99 conf.
https://www.vldb.org/pvldb/vol17/p2115-leis.pdf
https://www.vldb.org/pvldb/vol16/p3335-butrovich.pdf
From one side, there is a commercially driven push from cloud providers to treat databases similarly to any other stateless system. However, in reality, it’s not that straightforward.
Meanwhile, some parts of academia are moving in a different direction—bypassing OS abstractions and moving user-land operations to the Kernel, such as eBPF. This is a tricky move since it implies that other abstractions like kernel resource management won’t be as useful in this context.
If you walk far enough in one direction, you’ll return to where you started. If successful, this approach feels like we’re moving toward big, powerful machines running high-performance transactional databases with satellite applications, kind of like BigIron all over again.
It’s a long short, but if assume this works. who wins in this scenario? not AWS or Azure for sure.