Harvard’s Advance Python for Data Science -My Review

Image for post
Image for post

When I started my job as a hospitalist, I was very excited about the week on/week off schedule. Typically, most hospitalists put in 80 to 84 hours in one week, which is then followed by the next week off. I traveled a lot during my off weeks but then I reached a point where I felt it’s time to do something a little more productive. I was spending a lot of time on Quora, reading not only about “ten most creepy pictures” and “how does Zuckerburg spend his billions”, but also about new technologies. I got really interested in natural language processing and the concepts behind it, interested enough that I decided to learn python. This was followed by a bunch of amazon orders of books like “learn python in one day”. I felt that python is very intuitive. Even for someone like me, with no background in computer science and minimal experience in any programming language, it was quite easy to learn. The best way to learn any skill is by practicing it therefore I decided to work on some projects. I was able to come up with website to scrape twitter and Instagram to hashtags appearing along with the user provided hashtag or phrase (tags4insta). Working with online tutorials, I came up with a project for work, extracting meaningful information from patient feedback comments using NLP. I was able to get it published in a reputable journal. The publication really encouraged me to learn more and decided to actually pursue a degree in data science, believing being a doctor, I have a slightly different perspective of the immense healthcare related data in a hospital than somebody who is not that closely linked to healthcare.

After some research, I found many programs offering online graduate degrees in data science. My two favorites were UC Berkley and Harvard Extension School Data Science Degree. However, I decided to go with HES mainly because of the cost, and the option to take courses on campus. The admission process is rather simple. You take two admission courses and if you do well, you get into the masters program. One of the two courses was Advance Python for Data Science (CSCI E-29). I was pretty excited about that course and felt rather comfortable about python. However, little did I know, that I knew very little.

It seems like the course is in great demand. I missed registration once because it filled up in 2 days! I had to wait for the next term. Therefore it is important to go through what’s required for the course and take care of that ahead of time. The class meets once a week on campus for the lecture, which is streamed online, but flexibility is the goal of the course, therefore recordings are available the next day or so in case you miss the live session. You are assigned a TA that you can communicate with and also have office hours/sections online that supplement the lectures. You are given an assignment in which you implement the concepts covered in the lecture that week in the form of a problem set. You are expected to review your peer’s work as well after each assignment, for which you receive part of the credit.

The course emphasizes a lot on doing things “properly”, which is what you would do in a professional setting. That was actually quite challenging for me. Github commits, documentation in the code, Travis, code climate badges, docker, all things that may be normal to a seasoned programmer but things I would have never thought of as a self-taught programmer. It did get better as I went through the course and now I feel that’s how I will be doing all my projects!

Now let’s talk about the python covered in the course. As expected, it really is advanced python. Things that are not covered by most of your usual online tutorials, like descriptors, decorators, composition, atomicity, meta programming and so on. The course is big on functional programming and talks about treating code as data, once again, concepts that I do not think I would have learned on my own. The problem sets sort of force you to read up a lot. There were many times that while working on the problem set, it would take me hours to figure out a certain solution, but once I would figure it out, I would realize that it just needed a few lines of code. Sections are held many times in a week and the TAs can elaborate on the concepts implemented in the problem set or answer specific questions, however, I could not participate in most of the sections because of time issues.

The course also introduces you to many python based technologies. Luigi by Spotify is one of them. For database, Django is their choice, along with REST Framework. You also get quite a bit of introduction to Dask. You also learn how to work with AWS S3 buckets. There is a lot of things that you do not actually study but work with during the problem sets that really trigger curiosity and excitement. For example, in one assignment, you work with pre-trained style transfer models to stylize your pictures. In another problem set, you work with cosign similarity of different documents and my favorite one was an assignment in which via Django and REST Framework, you read yelp reviews from a database, perform aggregations and display the results, as well as display a graph on the website.

About my course-mates, everybody seemed very nice and helpful. I never interacted with anyone directly. But on the forum, any question posted by one student was answered within 30 minutes by another student. During peer review, I almost felt embarrassed for my code while reading theirs. It seemed like most of them were really good at it. But I went in with the mindset that I am in the course to learn, knowing my skills were very basic as I have no programming background. However, the whole group was extremely supportive and it was somewhat comforting to see that other’s were stuck mostly at points where I was stuck. Before googling a solution, I found it always helpful to check on the forum (Piazza) first, and almost all the time I found a post about what I was looking for. There were two exams, mid term and final. The exam was conducted through Proctorio online, which captures your screen, camera, audio, clipboard and applications running.

The last part of the course is a project. Students are expected to submit a proposal, reviewed by peers, then make a video presentation about the project/or choose to present live online, and finally submit the repository for the project. I worked on a Django based project, included REST Framework for sharing public data and Luigi to generate a daily report after running some statistics. I am still waiting for my grade but I do feel very good about my project.

In order to get into the masters, I need at least a B. I am hopeful for a B, I will take it, but I am also prepared to take the course again. COVID-19 happened while I was going through the course and my work got very busy, so it was really hard to keep up with the course work. I missed most sections but listened to lecture recordings and participated in discussions on Piazza. I was late on my assignments in the last few weeks but the faculty was kind enough to not deduct any points which I am very grateful for.

With this review, I am hoping to help someone who is in the shoes that I was in before starting the course. Looking at the syllabus does not really give the insight of how the course works and I really tried to find a good review for the course but I couldn’t, so I decided to write one now, for someone with very little programming experience. I have learned a lot and already thinking of not only new projects that I want to work on but also improving the old ones.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store