Quantcast
Channel: Naftali Harris: Statistician, Hacker, and Climber
Browsing all 48 articles
Browse latest View live
↧

Don't Double Major

Double majoring in college is a very suboptimal strategy. The reason is simple: It adds a substantial set of constraints to the courses you can take, but in return gives you only a very modest extra...

View Article


A College Waitlisting Model

Suppose a selective college wants $N_0$ students in their freshman class. How many students should they admit, and what's the distribution of the number of students they'll admit off the waitlist? Of...

View Article


College Interview Tips

The college admissions interview is a valuable component of college applications because it provides admissions officers with a holistic evaluation from a source that has no vested interest in your...

View Article

Visualizing Lasso Polytope Geometry

Some recent research about the lasso exploits a beautiful geometric picture: Suppose you fix the design matrix X and the regularization parameter $\lambda$. For a particular value of y, the...

View Article

Sensitivity of Independence Assumptions

Recently I was considering an interesting problem: Several people interview a potential job candidate, and each of them scores that candidate numerically on some scale. What's the variation associated...

View Article


Python Subclass Relationships Aren't Transitive

Subclass relationships are not transitive in Python. That is, if A is a subclass of B, and B is a subclass of C, it is not necessarily true that A is a subclass of C. The reason for this is that with...

View Article

Robust Machine Learning

Real data often has incorrect values in it. Origins of incorrect data include programmer errors, ("oops, we're double counting!"), surprise API changes, (a function used to return proportions, suddenly...

View Article

T-Tests Aren't Monotonic

R. A. Fisher and Karl Pearson play a heated round of golf. Being Statisticians, they agree before the round to run a two-sided paired T-test to see if either of them is statistically significantly...

View Article


How to Forge an Email

Most people don't realize how easy it is to forge an email. Say my brother John Doe uses the email address john.doe@example.com. If I get an email from that address, it's natural to assume that John...

View Article


Half the Decimal Trick

If something happened 1,234 out of 10,000 times, we'd estimate that the true probability of occurence is about 0.1234. Of course, we wouldn't expect the true probability to be exactly 0.1234, and to...

View Article

You Can't Predict Small Counts

A small restaurant is interested in predicting how many customers will come in on a given night. This is valuable information to know ahead of time, for example, so that the restaurant can figure out...

View Article

Visualizing DBSCAN Clustering

A previous post covered clustering with the k-means algorithm. In this post, we consider a fundamentally different, density-based approach called DBSCAN. In contrast to k-means, which modeled clusters...

View Article

Machine Learning over JSON

Supervised machine learning is the problem of approximating functions X -> Y from many example (x, y) pairs. Now, the vast majority of supervised learning algorithms assume that X is p-dimensional...

View Article


OHMS Lessons Learned

Note: I found the following post as an almost complete draft as I was reading some of my unpublished posts. I wrote it around October 1st, 2013, at the beginning of what would end up being my last year...

View Article

Desperation Motivated Creativity

I am not the strongest climber. Some of the people I've climbed with are so strong that they can do a one-arm pull-up, and then--while locking off with one arm--sing the "Head, Shoulders, Knees and...

View Article


Why I'm Making Tauthon

For the past two months I've been spending half my time on Tauthon. Tauthon is a backwards-compatible Python interpreter that runs Python 2 code and C-extensions exactly as-is, while also allowing...

View Article

An Easy Chess Puzzle

I was looking through Markovian, my old chess engine, recently, and came across the first game it won against another chess engine. Stepping through the game, it seems that both engines actually played...

View Article


Continuous Time Lending

Assume a borrower takes out an installment loan of size $1$ and makes continuous-time payments on it. The installment loan starts at time $0$, ends at time $T$, and has an interest rate of $r$,...

View Article

Day-to-Day Operations of Palo Alto

Palo Alto runs a pretty open city government, with a number of interesting documents available for download on their website. Of particular interest are their annual budgets and annual financial...

View Article

Implementing "nonlocal" in Tauthon: Part I

Tauthon is a fork of Python 2.7 with syntax, builtins, and libraries backported from Python 3. It aspires to be able to run all valid Python 2 and 3 code. In this article, I begin discussing how I was...

View Article
Browsing all 48 articles
Browse latest View live