Notes on the First Year of Grad School

Aug 26, 2015

When deciding which program to join for my PhD I came across Jerzy's blog. His post on the 1st semester was very helpful in confirming that CMU was the place for me. You definitely should read all of them: 1st, 2nd, 3rd, and 4th thus far. Anyways, as an act of blatant imitation, here's my own review of the first year.

Attention Conservation Notice:¹ Of interest mostly to other PhD students. Sample size of one; may not replicate

Lessons Learned

Before I start in on the presumably boring details, I found it helpful to distill the entire experience into a few short "lessons".

Know thyself

There's no doubt that the first year of grad school is a humbling experience. You're exposed to the many things you don't know and you must work very hard to start correcting that situation. Without a good sense of your limitations this is incredibly exhausting. So knowing when you're tired and frustrated and knowing what to do to fix that is essential. I find that exercise and a book are helpful.

That being said, it's also nice to "internalize that you are, in fact, a badass", to borrow a phrase from Cosma Shalizi. It's quite a morale boost to realize just how far you've come in a relatively short amount of time. So while in the middle of a measure theory problem set, just remember that only a few months ago you had couldn't even understand the statement of the problem you're solving.

Know your tools

On a more quotidian level, this year has been an education in the mechanics of getting work done. With problem sets due every week, homeworks to be graded, papers to read, and research to be done you need to get on top of things. Investing a little time in improving the process by which you accomplish these tasks will pay off in the long run. All that to say, my emacs-fu has gotten much stronger this year.

I'll just list a few of the tools and commands which I now find indispensable:

emacs, especially elfeed (rss), mu4e (email) and above all org-mode (uncategorizable)
git and github (students get free micro accounts)
ssh
rake

Know the people around you

One of the aspects of CMU which had escaped my notice before deciding to attend was the wonderful culture. The students are very friendly and the professors helpful. This is nice because you occasionally hear stories of the academic version of Thunderdome²: "two men enter, one leaves" (with a diploma). Instead, things are very cooperative which is fortunate since you can benefit from specialization of knowledge; you help me understand real analysis and I help you understand programming. On a less transactional level, you spend quite a bit of time with these people and assuming you're staying in the field will never quite escape them; so it's just as well that you befriend them.

Coursework

Now for the nitty-gritty of the first year. A great deal of time and effort in the first year was spent on coursework. CMU front-loads the coursework so that we're done with required courses very early. Thus the courses are meant to get you up to speed for research.

Fall

10-701 Machine Learning

A whirlwind tour of various algorithms ranging from the familiar linear and penalized regression to the new-to-me perceptron and SVM. My biggest gripe was that they required us to code in Matlab's open-source cousin Octave.

36-699 Statistical Immigration

This was excellent. This was not a class, merely a scheduled time for each of the faculty to give a short talk about their work. It was extremely helpful not only so I could recognize all of the faculty, but so that I had a first approximation to what everyone was doing. Several professors also took this time to advertise ADA projects which was also helpful.

36-705 Intermediate Statistics

Coming from a heavily Bayesian school (Duke) I had a lot to learn/refresh about classical statistics. Larry Wasserman's course was the way to quickly get up to speed. More or less following the content of his All of Statistics³, it was the perfect way to get into the swing of things.

36-707 Regression Analysis

All of the other courses were heavy on math and theory, so 707 served as our connection with "real" data analysis. I thought I had regression all figured out, but this class helped fill in some of the gaps in my education.

Spring

36-702 Statistical Machine Learning

Where 701 was a whirlwind introduction to several methods, 702 went back and looked at the theory behind those methods. Co-taught by Larry Wasserman and Ryan Tibshirani, this course was my favorite thus far.

36-752 Advanced Probability

This was our introduction to Measure Theory. I definitely benefited from being a math major in college. The actual content of my undergrad courses was not particularly helpful, but the mental patterns for figuring out proofs were crucial.

36-757 Advanced Data Analysis

This was also not exactly a course so much as a scheduled time to give short talks about our progress on our ADA projects. It's been great to have a variety of opportunities to practice speaking. The only way to get better is to practice, and this course gave us plenty of feedback on our skills.

Not Coursework

Of course, not everything is about coursework.

ADA Project

All PhD students in the department have an Advanced Data Analysis (ADA) project which they typically complete over the course of their first spring and second fall. Mine is working with recordings of neural activity in Parkinsonian mice. The experience working with real data has been educational and it serves to connect all of the work we've been doing in our course with actual data analysis.

TAing

I was a TA for 36-225 in the fall and 36-217 in the spring. While not necessarily the most exciting aspect of the program, I've framed the work as practice at conveying statistical ideas to non-statisticians. Office hours are enjoyable, although grading papers is a grind.

Summer

Over the summer I worked as a TA for our summer research program for undergraduates. This was a great deal of fun. Once a week I ran an R tutorial for the entire program and the rest of the time I worked with a small group on their project. Their dataset was gene-expression levels in autistic and control brains and the goal was finding differentially-expressed genes. It was an excellent context in which to bring up multiple-comparison testing, permutation tests, and correcting for confounding variables.

Non-academic Activities

Whether it's playing intramurals, walking in Schenley Park, or getting books from the public library, it's good to get out of the department. I've particularly enjoyed setting up my hammock on Flagstaff Hill and relaxing with a good book from the Carnegie Library of Pittsburgh.

A lot of the time the department has come along, as our cohort has been very good at socializing outside of the department. I've even hosted a few board⁴ game nights at my apartment.

Footnotes:

As further evidence of my lack of originality with this post I decided to borrow from Cosma Shalizi, although in the course of writing this I actually followed his link and apparently he got it from Bruce Sterling.

I'll admit, this would make thesis defense more interesting. As always, xkcd got there first.

I don't quite know why he needed to write All of Nonparametric Statistics when it clearly must be a subset of All of Statistics.

⁴

Protip: be sure to specify board games, otherwise everything devolves into a party game like Cards Against Humanity.

Filed under: phd

« Flatland and Spherical Geometry Closed proprietary software ruined (a sizeable chunk of) my day »