Sabbatical August Report: My Year as a Data Scientist

September 2, 2021

This is a continuation of the series of my sabbatical reports. Here are the previous entries.

I am continuing working on the same project. I have all of the data put together (from four different databases), and now I am testing and validating the data. This essentially means that I am looking at a random sample of my data and computing the results by hand.

The skills needed to be a data scientist

I am using the same tools. I hope to talk more about more exciting tools next month, though.

I use a lot of joins, almost all of which are outer joins (either “left” or “full”). I have temporarily understood the different joins at different points in my past, but now I am fluent. I have been using joins for three (I think) purposes.

  • I use full outer joins to concatenate two data sets that contain the same information. For instance, we have one database that stores old data and one that stores news data. If I want to look at all of the data, I do a full outer join to combine them.
  • I use left joins (you could make it a right join if you like) to append new columns to a table without adding new rows. This is how I usually think of a left join.
  • I use left joins (again, you could make these right joins easily) to filter out data. This isn’t something that I thought of prior to this. Basically, I have a main table, but it has too much data. If I can create a second table that just has the rows I want, I can do “second table LEFT JOIN main table.” I don’t do this often—I am not in this situation a lot, and filtering usually works better—but I have done it.

How academia and business are different.

In academia, I get a lot of pleasure of helping students learn. This is pretty immediate, since the students are often right in front of me. Since I strongly value learning and education, I regularly see concrete ways where I help the world, albeit in small ways each time.

My experience in business has been different, which is a bit ironic. I am working for a department in the bank that helps another department in the bank collect payments from people. There are several layers between where I am and the people I am supposed to be helping (at least one department). There is also a time delay: I am working on something that won’t be used for at least a couple of months, and that is the only thing I am working on.

However, the bank lends a lot to farmers (it was voted the best bank for agriculture—and the best bank overall—in Minnesota in 2021). The bank is doing good work for society by, say, helping farmers buy equipment so that we can have food to eat. However (and I am embarrassed to say this as a mathematician), this seems a bit too abstract for me at times, and I sometimes struggle recognizing the importance of the work. But I truly believe that much of it is important—I just don’t always feel it.

How will this experience influence my teaching?

I don’t have much to say with respect to my sabbatical (unless it is subconscious), but I have been thinking a lot about labor-based grading. I am grateful to David Clark for being willing to have me bounce half-baked, rambling ideas off of him. He opened himself up to such treatment by mentioning labor-based grading in the excellent Grading for Growth Substack (with Robert Talbert).

Actually, there is one thing: it seems like there is a great demand for data-literate people in marketing. So I might push harder to get our business majors into our Data Analytics minor.

My feelings about being in industry.

My main experience right now is sadness that I am not teaching. I was briefly back on campus yesterday, and I was happy to see all of the students roaming around—and sad that I am not directly a part of it this year. This is part of the purpose of a sabbatical—to make you appreciate the great gig that you already have. Absence makes the heart grow fonder, and all.

Sabbatical July Report: My Year as a Data Scientist

August 2, 2021

I started working as a data scientist this June for my sabbatical project. Here is an update.

The Skills needed to be a data scientist

I continue to use Python (Pandas, in particular) daily. I have also needed to become familiar with other specialty software such as the Customer Relations Management software Salesforce.

Last week, I had to write my first for loop. Pandas does its best to allow you to avoid having to explicitly use them, but it has its limits. It doesn’t really feel like I am coding; it feels like I am mainly just doing a bunch of pandas commands to look at the data in a different way. Pandas makes things pretty easy.

Speaking of making things easy: the other people on the data team are working to use software so that everyone in the company can have access to the data via a drag-and-drop interface. I haven’t seen this software, so I can’t say much about it.

The most challenging part remains the banking knowledge. I have spent the last two weeks updating my code to look at finer data. Think about try to measure student learning and trying to decide on the unit of measurement. You could measure how the college is doing, how the classes are doing, or how each individual student is doing. I essentially made the switch to look at the analogue of “individual student” instead of “class.” This has both advantages and disadvantages, and we thought that it was worth the transition.

One thing that I continue to struggle with is knowing where data lives. There are fields in the database that are always empty, so I need to learn not to look there. There are also many similarly named fields, and I need to talk to people to learn about what they mean. In some cases, it seems like no one is really sure.

How academia and business are different.

I don’t have much to report on this this month. Because I have an unusual, one-year position, I think that my experience isn’t that different from doing research. I need to mostly create my own structure, I need to talk to people every couple of weeks to learn something, but I am largely working on my own. I am enjoying it.

I know that my experience isn’t typical for a data scientist, though. However, I can’t report on what I don’t know.

How will this experience influence my teaching?

One interesting thing that I have learned is that my urge to procrastinate is much higher than it has been in the last couple of years. I am able to control this through time blocking, but it is definitely a noticeable feeling. My best guess is that I am a lot less competent in this role, since I am new to data science. I think that my lack of expertise makes me want to avoid the work.

This gives me new empathy for my students, and it makes me realize focus is affected by competence. My students are new to the material in my classes (why else would they take the class?), so they likely have to fight procrastination at least as hard as I do now.

My feelings about being in industry.

See what I wrote for “How academia and business are different.”

Sabbatical June Report: My Year as a Data Scientist

July 2, 2021

I am on sabbatical for the 2021–2022 school year. I am working at a bank as a data scientist. My intent is to give regular reports on how this is going. Here are several things that I intend to focus on.

  • The skills needed to be a data scientist. In particular, I will need to write up a report about my experience that can be used if we ever have a data science major.
  • How academic and business is different. What can we learn from business? Robert Talbert did something similar when he was working at Steel Case.
  • How my experience is going to influence my teaching. In particular, I am not starting this sabbatical with mad data science skills, so I get to be in a student’s role this year.
  • My feelings about being in industry. I worked for two years before I went to graduate school (20 years ago!), so I am used to (and like) industry. However, I love academic. I will let you know if I have any interesting feelings.

I just finished my first month on the job. So far, I am really enjoying it. My team is good and helpful, and I have an interesting project. I am going to generally keep the details of my projects vague, since I am working with a lot of sensitive data. My first two projects will be using data to predict how existing business loans will fare. That is, I will be trying to figure out if we can predict when businesses might default on loans.

Here are my thoughts from the first month.

  • I am using python, mostly pandas, in Jupyter notebooks. I am doing a lot less of what I think of as traditional “coding:” I don’t think I have used any loops yet (aside from the ones that pandas uses in the background), and I have only defined two functions. The python part of the job really is about learning how to do things with pandas—I think that my coding skills are much higher than are needed for the job so far (I am a proficient coder, but I am nowhere near great).
  • I making SQL calls to databases. These have been pretty basic so far, and my very rudimentary SQL skills from my previous stint in industry are more than sufficient to get me going. I have definitely learned a lot of SQL in the last month, but it has been painless.
  • One of the most challenging parts of my job is learning my way through the databases. There are several, and they contain similar (but not the same) information. Some parts of each database are fake. For example, if I want to find someone’s name, I might find that a field called “Name” is largely empty and I need to look at a field called xpadsf_name__d instead.
  • The most challenging part of the job—by far—is learning about banking (loans specifically). This is something that I can’t really reason out (or Google, oftentimes). There is technical vocabulary, and I have to learn how to think like a banker. I don’t see a way to be a good data scientist without the knowledge of the industry—you need to know what data might be important to look at.

I really like the job so far. My experience has been very similar to my research sabbatical—I just need to sit down each day and make progress on my project. I think that my boss has largely shielded me from other obligations (e.g. other departments asking for help), and I am grateful for that. It really does feel similar to doing research—I often feel lost, I usually make small progress, and I sometimes figure out something that makes a whole bunch of stuff come together.

I have been aware that, as a new learner, I am often completely confused, both when talking to other people and when working on my own. I deal with it by largely letting it wash over me, knowing that I am learning even though I am confused. I have been taking details notes as much of as I can, and I have been saving text files with instructions on how to do technical things that I know I am going to need to remember how to do. The whole process has been really enjoyable.

However, I think that it is enjoyable because I have already had a lot of practiced being confused. Research in mathematics is hard, and I spend most of my research time being confused about something. This is something that my students have not necessarily experienced. It might be nice to scaffold this in some way.

Similarly and specific to data science: it might be nice to have students work on a series of databases. They might start with exactly the data that one needs, and then graduate build (over a couple of years, perhaps) to a complicated mess full of extraneous and missing information.

Again, specific to data science: our curriculum should match something that our students can easily develop expertise in. This is not easy, and my best idea so far is to have them student school-related issues (they all have a lot of experience with schools, after all).

Again, specific to data science: it is often said that 80% of a data scientist’s time is spent cleaning data (or something close to 80%), where cleaning data means finding it, making sure that it is the right type, combining data from difference sources, create new data out of old data (e.g. averaging some numbers), etc. This is the tedious prep work (although it is enjoyable in its own way) you must do before doing the fun stuff. This has largely been true in my experience. While I don’t necessarily think that students should spend 80% (or whatever) of their time in a data science major cleaning data, students should definitely be doing this on a regular basis.

I will end by noting that I am mindful of the fact that ethics can be an issue with data science. I will write about them when appropriate and useful.

So: this has been a great experience so far! I already miss being a professor, and it is weird to think that I am going to be away from my campus for about 2.5 years (1.5 years due to the pandemic including the summer, and one year due to my sabbatical). I am going to be excited to be back on campus, but I am enjoying my sabbatical for now!

Ingressive vs Congressive

May 13, 2021

I am starting to be a big fan of Eugenia Cheng. She has a bunch of thoughtful essays on mathematics, and I enjoyed How to Bake Pi.

Her latest book is X+Y: A Mathematician’s Manifesto for Rethinking Gender. The book has two parts. The first of which talks about how a mathematician might think about gender, which I found to be a stretch and not very strong. At this point, I was disappointed.

However, the second part introduces a fantastic idea. We have this idea about “masculine” traits (e.g. courage, assertiveness, etc) and “feminine” traits (e.g. empathy, humility, etc). Cheng says this is problematic because gender and these traits are actually different dimensions.

Her solution is to introduce vocabulary, which I find to be an extremely power idea. Her suggestions are “ingressive” as a replacement for “masculine,” and “congressive” for feminine.

Here are two examples from old Saturday Night Live episodes. Will Ferrell’s version of Janet Reno should be described as “ingressive” rather than “masculine,” since Janet Reno is a woman (how can it be “masculine” if she is a woman?). Similarly, Al Franken’s Stuart Smalley would be “congressive” rather than “feminine” (why would we say “feminine,” given that Stuart is male?).

Describing a woman as “masculine” or man as “feminine” has a judgment built into it (“You are doing it wrong!”). The words “ingressive” and “congressive” simply describe behavior without the judgment.

If you are interested, I encourage you to read the book. Again, I would very lightly skim through Part I; just read enough to get the context. Part II is where the good stuff is.

Cheng also makes a case for why mathematics is too ingressive, and she offers some examples of how to make mathematics more congressive.

I love this contribution, but I also wonder if this has already been proposed by someone in gender studies (likely with different vocabulary). Do you know of any work from that field that is related to what Cheng did?

Apple Watch as Dumb Phone

May 5, 2021

My phone for the past seven years or so has been an AT&T Z222 flip phone. I had a similar dumb phone prior to that for about a year, and I didn’t have a phone prior to then. I have been getting dumb phones to save me from myself. They are also a lot cheaper, both up-front and monthly.

I got a notice from AT&T saying that they will no longer support my Z222 in February 2022 due to their new 5G network. Soon after, the Z222 simply started not to work (it wouldn’t retrieve text messages, and the alarms started being unreliable).

I was looking for a replacement phone, but there weren’t great options. The options that were there didn’t get great reviews, and they were more expensive than I wanted (I prefer not to pay $80 for a dumb phone). I wasn’t happy with any of the options.

Enter the Apple Watch. This is really perfect for me. Here is how I did it.

  • An Apple Watch needs to be tied to an iPhone. This would seem to be a problem for me, but my wife has an iPhone. I can tied the Watch to her iPhone, and then I can have my own phone number through Apple’s Family Plan. The Family Plan is largely meant for kids (give the kid a watch instead of a phone) and elderly parents, but they will let anyone sign up.
  • This does everything I want it to do, and not much else. Cal Newport always recommended that you just use your smart phone for calls, text, maps, and audio. The Apple Watch does all of these things well. I can also look things up online if I need to, but I not about to go down any rabbit holes due to the fact that it isn’t pleasant to read for very long on such a small screen.
  • It has a low monthly cost. My Z222 was pre-paid, and $108 would last me roughly one year. My Apple Watch costs $120 per year (through Truphone), which is essentially the same (it is more some years, less others depending on how much I use the phone during the year). For roughly the same price, I get unlimited minutes now.
  • The Apple Watch is expensive up front—roughly $300. However, I use a running watch to track my speed and distance when I run, and the Apple Watch is only marginally more expensive than a replacement watch. So for $300, I get a new running watch, a cell phone, and probably save about $50.
  • I love new tech, and I think that it is just cool.

Now, I don’t talk on the phone a lot, and I don’t text a ton (when I do, the voice dictation works really well). So while this is a great option for me, it might not be for most.

I have been using the Watch as my cell phone since February, and it is the right solution to me. I am largely posting this because I had to work pretty hard to figure out that (1) there is a Family Plan and (2) it would work for me.

Discrete Mathematical Modeling Postmortem

April 21, 2021

I just got done teaching my Discrete Mathematical Modeling class. See here, here, here, and here for previous posts. Here is my summary of how the class went.

I did not do a good job with the course. I am not down on myself about it, since I don’t think that there was a way for me to do much better: I found out that I was teaching the course a in January, the class started in March (we are on block scheduling this year), and I had two intensive Calculus II blocks between when I found out and when I started teaching the course. I simply didn’t have a lot of time to plan. That said, I am pretty impressed with the level of planning that I was able to do, considering the constraints.

Moreover, I didn’t have a lot of time to adjust once the block started. In particular, I was the chair of a search committee, in charge of registration for my department, advising a student about to defend her thesis, finishing up an independent learning project with three students, and probably a couple of other things during the block. It was kind of a ridiculous confluence of big jobs. Again, I am proud of the way that I managed my time, but I was pretty constrained and I wasn’t able to do everything I wished I could have.

The interesting thing to me is that I wasn’t doing an objectively good job teaching, but I have enough teaching experience so that I could recognize this in real time—I think this was my first subpar teaching effort where I wasn’t subject to the Dunning–Kruger Effect. Here were my main issues.

  • I didn’t have enough time to prepare and adjust, as noted above.
  • In particular, the block schedule hindered my ability to adjust, even if I wouldn’t have had all of the other obligations. I didn’t have a good sense of how well things were working until 1.5 to 2 weeks in, by which time the class was halfway over.
  • My pedagogical content knowledge wasn’t where it needed to be. I hadn’t taught the class before, nor have I thought about modeling deeply before the class began. I simply wasn’t aware of what the students would find easy and what they would struggle with. This made teaching doubly hard—I didn’t know what to look for, and I didn’t always have the fastest way of correcting misconceptions. I didn’t know enough to be able to predict how students would react to the material, so I didn’t have tools available to help them.
  • Related; I mainly stole my modeling problems from trusted sources. However, the problems didn’t do what I wanted of them. In some sense, they weren’t as rich as I wanted them to be. Either I didn’t select the problems well, or my expectations for what problems could be was too high.

I have mainly been teaching the same classes for the past 13 years, and so I have developed a good sense of what to expect students to do. It was an interesting (and unpleasant) experience not be armed with ways to help students. It made me recognize how much progress I have made as a teacher in the last 20 years.

On the plus side, a lot of things went well.

  • Doing a version of ungrading was a really smart move on my part. This gave me a lot of flexibility that I didn’t know that I would need. When the class struggled with one particular project, I had the freedom to walk the class through it. This was really the best thing for their learning, and it didn’t mess up my grading scheme—I just gave them all credit for it. In my original plan, this would have meant that they only would have had four team projects instead of two, and I would have had to wring my hands about how to adjust my grading scheme on the fly. Students also reported that they appreciated the ungrading.
  • The next step in my evolution toward moving my courses toward the CURE end of the spectrum. The projects were open-ended, and the students had a great flexibility to look at problems in the way that they wanted. It was really exciting!
  • I felt like I did justice to the Justice theme. We did some modeling on the Flint water crisis and gerrymandering. Students had to do an individual project where they do modeling on a Justice topic. I thought that it would be difficult for students to find topics, so I supplied them with a default topic (model how to set up a wheelchair program in an airport. How many wheelchairs do you need? How much would it cost? What sort of system can you install to get the wheelchairs to where they need to be?). I was surprised that only two students did the default project, and I was very seriously impressed at the topics students came up with. Some student-generated topics were: how universal childcare would affect the wage gap between men and women in the U.S., how the availability of generic drugs would affect health of people who need them, and how to distribute stuffed animals to children who are detained at the border.
  • I enjoyed working with the students. They were game, and they came up with good ideas. I was a bit surprised at how many of them expressed that they really liked the class, given that I was a bit down on my own teaching.

I will teach this class again. It was really enjoyable, and I would love to do it with the proper amount of prep time.

Picture a Scientist

April 7, 2021

This is a short post to advertise the movie Picture a Scientist. This film does a nice job describing the difficulties of being a woman in STEM. These range from not being given as much lab space as men, to being assumed to be a janitor, to being terrorized and bullied while in the field in Antarctica. It is worth watching.

Trello

March 31, 2021

I read A World Without Email, and Cal Newport used it to finally convince me to use Trello to organize my projects (something his podcasts failed to do). The jury is still out, but I can definitely see the advantages of having all of your obligations visually laid out and limiting yourself to the number that you can work on at any one time.

I have been using it for a couple of weeks, and it is (1) helpful but (2) I am not sure if it is helpful enough. My old system of using text files worked well enough, and they don’t require me to use one more tool (even if it is a good one). Still, I am going to try it for six months or so to see if it sticks.

Ungrading in Discrete Mathematical Modeling

March 24, 2021

My Discrete Mathematical Modeling course has started, and we are off to a good start. Here is what we have done so far:

  • We had a fantasy basketball auction, where the players were bid on only by formulas.
  • We did several Three Act activities.
  • We introduced the ideas of expected values, discrete dynamical systems, and linear programming (OpenSolver crashed, so we didn’t finish the linear programming problem).
  • We talked about Justice.
  • We talked about course policies, including grading.


This course has a Justice theme, and Justice involves power. I wanted to be very aware of the power I held over students, and I wanted to eliminate as many unjust parts of being a teacher as I can.

One thing that I did was to introduce something that is inching closer to ungrading. This is very similar to what I did in my capstone classes these last couple of years. Essentially, I gave them a list of assignments that they need to do. They can keep revising them until they get credit for them. If they do all of the assignments to completion, they get at least a B in the course.

This makes sense to me as far as the course goes, too. Modeling is similar to research, in that the students won’t end up at a predictable place (and they shouldn’t).

Scheduling and Hiring

March 17, 2021

We are in the middle of hiring two one-year term positions. We skipped interviewing at the JMM because (1) we intentionally set our deadline to be late (March 1st) so that we can minimize the competition with tenure-track jobs, and (2) there is a pandemic.

To replace JMM interviews, we are meeting via Zoom. Scheduling several interviews sounded like a nightmare, and I did not want to do it by email. Instead, I turned to an old friend: youcanbook.me (I am not paid by youcanbook.me; I just like their service).

Here are the steps I took.

  • I had my search committee (five total people) send me times when they are available.
  • I manually figured out all of the times when at least two of us were available to interview.
  • I created a new calendar within my work Outlook that was specifically for hiring.
  • I blocked off all of the times in this Outlook calendar where two of us weren’t available.
  • I linked this Outlook calendar to youcanbook.me.
  • I emailed (using Python) all of the applicants a welcome message and the link to youcanbook.me.

This whole process took me less than an hour—the toughest part was figuring out that I needed to create a new calendar in Outlook (other than that, this was a 15 minute job).

Then the magic happened. By the time I sent the email and check my calendar, two people had already scheduled interviews! The rest followed within about 24 hours, with almost no effort on my part (I just had to update the search committee’s shared document with the times of the interviews and the assigned interviewers).

The only drawback that I can tell is that there was not an equal work load among the search committee members. One member had significantly fewer interviews that than the others. I can live with that, though.

There are a bunch of other tools out there that will do the same, but I am very familiar with youcanbook.me already, and I don’t need any more features than it offers for free.