May/Final Sabbatical Report: My Year as a Data Scientist

May 4, 2022

This is a continuation of the series of my sabbatical reports. Here are the previous entries.

Summary: I am officially done working for the bank—I turned in my computer to them today. The timing was almost perfect: I was able to finish both of my projects, and I was able to document them well. I ended up creating ten videos explaining the most challenging parts of the project, so I hope that maintaining the code will be much easier.

The one open thread is automating the reports. They use Windows Task Scheduler, which gave me issues until the end. I am simply trying to schedule the running of a .bat batch file, where the .bat file calls the appropriate .py Python file. I was able to confirm that both the Python and the batch files worked when I operated them manually. However, I was not able to get it to work through Task Scheduler. Stranglely, Task Scheduler cased Python to start running (which is good), followed by the Python program using a lot of RAM bieng used (which is good, since it is a data heavy program), followed by the Python program using a lot less RAM (which is bad), followed by Python using a lot of RAM again (which is weird), followed by Python closing (which is bad). Ultimately, the Python program never produced a report. If anyone knows what is going on with this, please let me know.

I learned a ton this year. This isn’t hard given that I started the year knowing almost nothing about data science. I knew Python reasonably well, but I didn’t actually use those skills much this year. It was more about knowing a bunch of one-line Pandas commands. I now know much more about the data science process (e.g. testing and training data), and I understand a bunch of the models.

Most importantly, I have a lot of muscle memory on how to work with data in a Jupyter notebook. I think that this could be really helpful in teaching statistics. I previous used Jupyter notebooks in my statistics courses before, but my use of the notebooks is probably best described as “clumsy.” I would have to Google every third step, and I wasn’t comfortable doing it on the fly in the classroom. I am very comfortable with it now.

One skill that I didn’t because fluent in is creating cool graphs and visualizations. I did a bit of this at the end of 2021, but I didn’t do enough to get fluent.

I am glad that I did this. I learned a lot, I have a lot to give to the students (particularly with respect to advising students who are interested in data science), and I made some great friends (I miss them already).

That said, I am grateful to be a professor again. I am itching to start planning my classes, and I think that I have come out of this experience a changed teacher. I will blog about that later this summer.

Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: