Five Research Tips for First-Time Undergraduate Researchers
Mistakes that should only be made once
Published on 28 June 2013Today, I submitted my first extended abstract to the ASSETS 2013 Student Research Competition. This marks my first tangible output in undergraduate research. This isn’t quite the end of the story – I plan to continue gathering results and writing a paper for submission to CHI. It is, however, time for reflection.
My research endeavors (in human-computer interaction) have been an incredible learning experience. In this post, I want to talk about some of the more practical, elementary lessons that I learned about the research process. These are mostly time-saving lessons. Unfortunately, I missed a lot of these lessons when I did medical research in high school. My sincerest hope is that another undergraduate researcher, working on his or her first research project, will stumble upon this post and save some valuable time.
I have no doubt that grad students and experienced researchers know these already – these are for first-time lead researchers. Or, if you’re a grad student or Ph.D that disagrees with anything below, feel free to impart your wisdom by correcting or qualifying it (I place a lot of weight on the wisdom of the ancients – not that you’re ancient…).
1. Design and analysis: Find answers in the results and methods sections
Research papers aren’t only valuable for their graphs and conclusions. Chances are, there are similarly designed experiments floating around on Google Scholar (what papers are in your literature review?).
Although I initially relied on asking others, I found that consulting past work is an excellent way of finding answers to questions yourself. These include answers to questions like:
- What statistical analyses do others usually employ on experiments of similar design?
- How do others exclude outliers?
- How did others run experiments that had similar conditions?
- What metric is usually used to present errors in clicking tasks?
And many more. I still discussed my various approaches with other people before moving forward to make sure that I wasn’t misinterpreting the papers, but if you’re ever feeling clueless about anything, check the literature first.
2. Data Collection: If your data is simple, just store it as CSV
Really, just store it as CSV. Every data analysis application accepts CSV, and it saves time from parsing to store the raw results as CSV.
I went through a few iterations of data storage. First, I stored things in a MySQL database. Having interned at a database company last summer, the thought of appending my data to a plaintext file was horrifying.
I soon realized, however, that the overhead of storing, retrieving, and quickly inspecting values in a database was more trouble than it was worth. I opted to store my results (time, error, trajectory, detailed mouse logs, questionnaires – each with associated metadata) as JSON files, which made the logs easy to quickly inspect. Since I used MATLAB (more on that later) and JMP for analysis, I eventually parsed this JSON into numpy/MATLAB matrices, and finally, just CSV files. Oops!
Before you start gathering data, envision yourself processing the data. Keep in mind that you’ll most likely want to process your data as tables.
The caveat is that CSV isn’t too good at handling more complex data with parent-child relationships (though you could stringify lists into a cell).
3. Communication: Make it easy for your PI
Professors have a reputation for being busy, though my PI (research advisor) is receptive to meeting if I need it. One thing that I quickly learned is that meetings go much more smoothly if I make it easy for him. More concretely: when possible, I should generate ideas and present them to him for feedback, rather than having him do the heavy lifting.
After all, your PI is helping you, so he or she deserves your best effort to make the process less painful. These include things like being ready for meetings. At first, our meetings were pretty disorganized, but I learned that it’s a good idea to write up a very specific list of topics beforehand. This ensured that the meetings ran smoothly: demo, receive feedback, iterate through questions and discussion points, talk about next steps.
4. Data Analysis: Don’t use MATLAB for statistical analysis
For some reason (the reason being that I was only familiar with MATLAB and Excel for analysis), I thought it was a good idea to use MATLAB for doing statistical analysis and making graphs. While MATLAB has a statistics toolbox and graphing functionality, it’s not great for playing around with data and generating graphs quickly.
The choice of program is personal preference – or in my case, inheritance. After spending an embarrassing amount of time grouping my data in MATLAB, I watched my PI replicate my work in five minutes using the Graph Builder in JMP. I ended up using JMP, and I really like it. The Graph Builder is wonderful. The documentation is okay (MATLAB has great documentation).
There are a lot of options – R, Stata, SPSS. I tried both SPSS and JMP. My initial impressions is that SPSS has a pretty bad user experience, while JMP’s workflow seems better thought out (though minor things could be improved).
An auxiliary lesson is that it’s worth asking a grad student, postdoc, or PI about their workflow before beginning an endeavor. Unfortunately, the ones in my group work in a different office and weren’t immediately accessible, but I’ll definitely ask if I find myself in a new situation.
5. Graphing for papers: Excel is a good option
There’s graphing for the purpose of playing with data, then there’s graphing for the purpose of producing a good looking PDF.
For the latter, I found that neither JMP nor MATLAB would let me style my graphs with the flexibility and speed that I wanted. One of the postdocs said that she used R or Excel. Excel was actually a very good option, since I just wanted to change the fonts, size, and bar labels. Lesson: with all the fancy tools, don’t forget about Excel.
I suspect Python might have some good graphing libraries, although the only one that I’ve used is matplotlib
, which was basically MATLAB. It’s also possible to create graphs directly in TeX, although I would guess that it’s harder to iterate on the styling.