Sabbatical November Report: My Year as a Data Scientist

This is a continuation of the series of my sabbatical reports. Here are the previous entries.

The skills needed to be a data scientist. First, note that gasstationwithoutpumps pointed out that my explanation of a novel use of joins is, well, pointless (he was kinder about it). I haven’t yet worked out whether the problem was with my coding or my explanation of it, but I am betting it was the former while hoping it was the latter.

I am now halfway through my expected tenure as a data scientist. I have learned a lot. I have a lot more to learn.

I have moved on to building models, which is what most people think of when they think about data science (if they think about anything at all). Basically, this is the part where I make predictions based on the data. I am playing around with the following tools (I am trying to classify data).

AdaBoost
k-Nearest Neighbors
Logistic Regression
Naive Bayes
Neural Networks
Random Forests
Support Vector Machine
XGBoost

I have a couple of models that seem to do better than the others, and now I am trying to milk what I can out of the models to improve their performance. I will probably be working on the same thing for the rest of the month.

How academia and business are different.

See the section about my feelings.

How will this experience influence my teaching?

I spoke last month about my thoughts about ungrading. I think that this sabbatical experience is reinforcing the thought that grading really isn’t good. I am having to train myself how to function at this job, not having been trained directly in what to do—just like my students will do at their first job. Lifelong learning and all.

This is not just about me being naive (which I admit to). I understand that some students won’t respond well without the grade incentives. So I am not being idealistic. Rather, I think that it is a valuable skill to be able to self-assess (which I am assuming is part of ungrading), and then learn to address your weaknesses.

My feelings about being in industry.

I am struggling a bit. I am recognizing exactly how spoiled we are as professors. I don’t get a break a Christmas. Christmas is on a Saturday, and I will be back to work on the Monday (unless I use a vacation day, which I might. I do get paid for a full day on Friday if I work a half-day, which is nice). Frankly, working in industry requires a certain endurance that I don’t exactly have right now. I will do fine, but I can tell that my body/mind/spirit is expecting a break that it will not get. I might get paid less as a professor than I would in industry, but I certainly appreciate the time off.

Tags: Data Science, Sabbatical

This entry was posted on December 2, 2021 at 10:17 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

8 Responses to “Sabbatical November Report: My Year as a Data Scientist”

Andy Rundquist Says:
December 4, 2021 at 3:44 pm | Reply
(caught up!)

Wow I love this series! I’m so glad you keep adding to it. What a gold mine for people like me who both are interested in cool sabbatical approaches and are interested in data science as a thing for students to learn.

I’m really struggling with some of the machine learning approaches, because, while they find cool patterns, they rarely let me understand what’s really going on. I’m using them a ton to try to understand student retention and persistence based on courses they take and I think I’m seeing patterns that we can address, but I’m not as confident as when I’m the one slowly building the model with particular terms representing particular things.

How are you doing with that? Am I misrepresenting things with that framing?
- bretbenesh Says:
  December 7, 2021 at 3:50 pm | Reply
  I am just beginning with the machine learning, but my experience largely matches yours. The degree to which my experience matches depends on the model (Naive Bayes is pretty easy to understand, in principle, what happens, whereas neural networks are impossible to tell what is going on).
  
  I am betting that you already know this (I am pretty certain that you know a ton more about data science than I do; I am mostly writing this to remind myself in the future), but the .feature_importances__ method (or permutation_importance, in some cases) combined a knowledge of how the model works can provide some insights.
  
  But—yes—these can be a bit of a black box. Depending on details (like how much time you have), I might try approaching it from both ends. Slowly build the model yourself, build the machine learning black box along with it, and then use what you learn from the black box to inform your model (and vice versa).
Sabbatical December Report: My Year as a Data Scientist | Solvable by Radicals Says:
January 3, 2022 at 1:45 pm | Reply
[…] November Report […]
January Sabbatical Report: My Year as a Data Scientist | Solvable by Radicals Says:
January 28, 2022 at 10:29 pm | Reply
[…] November Report […]
February Sabbatical Report: My Year as a Data Scientist | Solvable by Radicals Says:
March 4, 2022 at 6:50 pm | Reply
[…] November Report […]
March Sabbatical Report: My Year as a Data Scientist | Solvable by Radicals Says:
April 4, 2022 at 9:17 pm | Reply
[…] November Report […]
April Sabbatical Report: My Year as a Data Scientist | Solvable by Radicals Says:
May 4, 2022 at 8:41 pm | Reply
[…] November Report […]
Solvable by Radicals Says:
May 27, 2022 at 8:43 pm | Reply
[…] November Report […]

Solvable by Radicals

Sabbatical November Report: My Year as a Data Scientist

Share this:

Related

8 Responses to “Sabbatical November Report: My Year as a Data Scientist”

Leave a comment Cancel reply