A Month in Questions

On Doing "Research"

Saturday, May 31, 2025

Here I present some notes given to Claude to format and edit into a rushed blog post for the month of May!

We're in an age where making datasets is much easier. The tools are better, the data is more accessible, and the barriers to entry have never been lower. But easier access to data doesn't automatically translate to meaningful research—it just changes the game.

Impact Statements and Research Taste

Making a research agenda is very good, but one thing I'm trying to get better at is formulating impact statements. Given your novel contribution, what do you expect to change? I think it's possible to come up with empirical results and publish them without clarifying this to yourself, but it's much more fruitful if you can clearly specify what it is that you care about.

It's a good practice in asserting your judgments. Then, depending on what happens in the world, you can get feedback on your intuitions and overall grow as someone who is trying to build their research taste. This feedback loop is essential—it's how you develop the ability to distinguish between work that matters and work that simply fills pages.

Finding Your Research Identity

I guess with my affinity for writing literature reviews, I'm able to say I enjoy the empirical analyses type of work. Knowing which area of work you're interested in producing lays some basis for the type of insight you should expect to produce. Of course, don't let this stymie the contributions you could make.

Along these axes, there's then a number of contributions I could make for a given work:

I could try and come up with a novel dataset (structure unstructured data from neglected sources)
I could draw some connection between parameters that people hadn't considered (backing this up of course)
I could also validate (or disprove) theoretical conclusions with real-world evidence (e.g. Epoch's work on the Chinchilla paper)

The RA Trap and Escape Route

I do think being an RA is valuable, especially if you're interested in the broader appeal of research. It's also a good environment for getting a sense of the literature, especially since most of your tasks will involve reading and synthesizing it. I think it's a decent opportunity to see what it looks like to come up with ideas and so on.

But it can definitely be a trap.

Here, I think there was an internal readjustment to orienting myself around thinking I could come up with new, important ideas. Then, it was important to test and transmit these ideas. Once you've developed some notion of subject-level interest, start reading the papers in a different manner: pay attention to their future work and directions that authors outline, come up with hypotheses (grapple with questions at a gears-level, that is, assert claims and intuit in what ways they may be true). Next, write these down.

Once you're here, now it's a great time to find mentorship. I think I'm at the point where having someone to help refine projects and plans, advise on writing papers is really nice and very valuable. Since you've done a lot of the heavy-lifting, the people you reach out to go from advice-givers to potential collaborators and advisors.

This is an environment where I think you can grow a lot and build fruitful mentorship. Don't be afraid to sandbox ideas and perhaps my biggest regret is not speaking ideas into existence sooner. I think the biggest gap is going from the ideas in your mind to a workable project, and this is where a forgiving mentor who gives you direction while you explore helps a ton.

The Value of Tedium in an Automated World

What is the value of tedium? What will change if we automate a considerable amount of mathematics and other fields? For example: what is the value of doing your own literature reviews, sketching your own proofs, and so on?

These tools can uniquely formalize a proof written on pen-and-paper, but we could head into a world where these sketches are generated by LLMs and then compiled into Lean or another formal, machine-readable representation. This makes me think of the 3-D representations with John Baez and something like could you formalize a 3-D, physical, functional, simulated representation of the world through your real-world model.

It's more a vessel for people in other fields who already use math as a way of formalizing their domain-specific models and having an extension of that. For example, the economist wouldn't have to solve for the integral by hand.

But we lose some essence of science once we start brute-forcing and if we completely abstract away human input.

The tedium isn't just busywork—it's where intuition develops, where you build the tacit knowledge that lets you spot patterns and anomalies that automated systems might miss. The question isn't whether we can automate these processes, but whether we should, and what we lose in translation when we do.