Quantcast
Channel: Naftali Harris: Statistician, Hacker, and Climber
Browsing latest articles
Browse All 48 View Live

Popping the Hood

Lots of things appear to be magic: Computers, the Banach-Tarski Paradox, cars, the phenomenal success of companies like Facebook, airplanes, the Internet, the Central Limit Theorem, and the fact that...

View Article



Finding Isomorphisms Between Finite Groups

One of the most interesting problems I came across as I was building my Abstract Algebra package was that of finding an isomorphism between two finite groups G and H, represented by their Cayley...

View Article

How My Chess Engine Works

I have always had a lot of respect for chess, despite the fact that I'm not very good at it myself. As I learned more about the game, I also heard about the successes of computer chess AIs, in...

View Article

CSS Gotchas

I'm still learning html and css. As I debug simple websites I write, (including this one), I've encountered a lot of behavior that seemed very counterintuitive to me. I thought I'd share some of this...

View Article

Incompetence as Encouragement

A few weeks ago, I was looking for a good python package that could convert user-written strings into numbers. I thought for sure a ton of people would have done this, but I was actually unable to find...

View Article


Being the First

I was at the climbing gym a few weeks ago with a friend of mine. We were just messing around, making up with bouldering problems for ourselves to do. One of the problems my friend came up with was a...

View Article

Don't Trust Asymptotics: Part I

Suppose I give you a sequence of real numbers $x_n$, and tell you that $\lim_n x_n = \infty$. What can you tell me about $x_{100}$? How about $x_{1,000,000}$?

View Article

Goldbach's Conjecture and Coding Length

Goldbach's conjecture is that every even integer greater or equal to four can be written as the sum of two prime numbers. (Try it: $4 = 2 + 2$, $6 = 3 + 3$, $8 = 3 + 5$, $10 = 3 + 7$, $12 = 5 + 7$...)...

View Article


The Hottest Person in the Group

Suppose you think you're pretty good-looking. In fact, you think you're at about the 90th percentile--you're hotter than 90% of people and not as hot as the other 10%. If I put you with another nine...

View Article


Memory Locality and Python Objects

I've been obsessed with sorting over the last few weeks as I write a python C-extension implementing a lazily-sorted list. Among the many algorithms that this lazily-sorted list implements is...

View Article

Visualizing the James-Stein Estimator

In the words of one of my professors, "Stein's Paradox may very well be the most significant result in Mathematical Statistics since World War II." The problem is this: You observe $X_1, \ldots, X_n...

View Article

Markov Chain Implications Graph

I've been studying for quals for the last several weeks. Today, I was reviewing basic Markov Chain theory, and decided to understand it by drawing a graph of various statements about Markov Chains....

View Article

Martingale Implications Graph

Here's another directed graph of statement implications that I used to study for quals. This one is about convergence of stochastic processes with martingales. Like my Markov Chain Implications Graph,...

View Article


The Zero Times Infinity Problem

There are two ways to keep yourself safe while rock climbing: The first option is to protect yourself carefully with ropes and gear, so that if you fall you won't fall too far or hard. The second...

View Article

The Ten Best Ideas in Statistics

I've been studying Statistics for six years now, seriously for the last four years, and as my main focus for the last three. Now that I've finished the core PhD curriculum at Stanford, I've spent some...

View Article


The LaTex Numbers

Let's define the LaTex numbers to be the set of all real numbers that can be unambiguously expressed with the LaTex type system. This set of numbers has a few fun properties, not least of which, as...

View Article

How I Got 2x Speedup with One Line of Code

If you had asked me whether or not it was possible to get a 2x speedup for my LazySorted project by adding a single line of code, I would have told you "No way, substantial speedups can really only...

View Article


Visualizing K-Means Clustering

Suppose you plotted the screen width and height of all the devices accessing this website. You'd probably find that the points form three clumps: one clump with small dimensions, (smartphones), one...

View Article

How to Solve Problems

I spend a lot of my time solving problems: I solve well-defined math problems in grad school, open-ended problems in statistical consulting, physical puzzles when I'm rock climbing, architectural...

View Article

A Statistical Analysis of Climbing

Recently, 28 of us on the Stanford Climbing Team completed a short survey on our climbing abilities. Although the survey was intended to assess our interest in different clinics, the answers to the...

View Article

Don't Double Major

Double majoring in college is a very suboptimal strategy. The reason is simple: It adds a substantial set of constraints to the courses you can take, but in return gives you only a very modest extra...

View Article


A College Waitlisting Model

Suppose a selective college wants $N_0$ students in their freshman class. How many students should they admit, and what's the distribution of the number of students they'll admit off the waitlist? Of...

View Article


College Interview Tips

The college admissions interview is a valuable component of college applications because it provides admissions officers with a holistic evaluation from a source that has no vested interest in your...

View Article

Visualizing Lasso Polytope Geometry

Some recent research about the lasso exploits a beautiful geometric picture: Suppose you fix the design matrix X and the regularization parameter $\lambda$. For a particular value of y, the...

View Article

Sensitivity of Independence Assumptions

Recently I was considering an interesting problem: Several people interview a potential job candidate, and each of them scores that candidate numerically on some scale. What's the variation associated...

View Article


Python Subclass Relationships Aren't Transitive

Subclass relationships are not transitive in Python. That is, if A is a subclass of B, and B is a subclass of C, it is not necessarily true that A is a subclass of C. The reason for this is that with...

View Article

Robust Machine Learning

Real data often has incorrect values in it. Origins of incorrect data include programmer errors, ("oops, we're double counting!"), surprise API changes, (a function used to return proportions, suddenly...

View Article

T-Tests Aren't Monotonic

R. A. Fisher and Karl Pearson play a heated round of golf. Being Statisticians, they agree before the round to run a two-sided paired T-test to see if either of them is statistically significantly...

View Article

How to Forge an Email

Most people don't realize how easy it is to forge an email. Say my brother John Doe uses the email address john.doe@example.com. If I get an email from that address, it's natural to assume that John...

View Article



Half the Decimal Trick

If something happened 1,234 out of 10,000 times, we'd estimate that the true probability of occurence is about 0.1234. Of course, we wouldn't expect the true probability to be exactly 0.1234, and to...

View Article

You Can't Predict Small Counts

A small restaurant is interested in predicting how many customers will come in on a given night. This is valuable information to know ahead of time, for example, so that the restaurant can figure out...

View Article

Visualizing DBSCAN Clustering

A previous post covered clustering with the k-means algorithm. In this post, we consider a fundamentally different, density-based approach called DBSCAN. In contrast to k-means, which modeled clusters...

View Article

Machine Learning over JSON

Supervised machine learning is the problem of approximating functions X -> Y from many example (x, y) pairs. Now, the vast majority of supervised learning algorithms assume that X is p-dimensional...

View Article


OHMS Lessons Learned

Note: I found the following post as an almost complete draft as I was reading some of my unpublished posts. I wrote it around October 1st, 2013, at the beginning of what would end up being my last year...

View Article

Desperation Motivated Creativity

I am not the strongest climber. Some of the people I've climbed with are so strong that they can do a one-arm pull-up, and then--while locking off with one arm--sing the "Head, Shoulders, Knees and...

View Article

Why I'm Making Tauthon

For the past two months I've been spending half my time on Tauthon. Tauthon is a backwards-compatible Python interpreter that runs Python 2 code and C-extensions exactly as-is, while also allowing...

View Article


An Easy Chess Puzzle

I was looking through Markovian, my old chess engine, recently, and came across the first game it won against another chess engine. Stepping through the game, it seems that both engines actually played...

View Article


Continuous Time Lending

Assume a borrower takes out an installment loan of size $1$ and makes continuous-time payments on it. The installment loan starts at time $0$, ends at time $T$, and has an interest rate of $r$,...

View Article

Day-to-Day Operations of Palo Alto

Palo Alto runs a pretty open city government, with a number of interesting documents available for download on their website. Of particular interest are their annual budgets and annual financial...

View Article

Implementing "nonlocal" in Tauthon: Part I

Tauthon is a fork of Python 2.7 with syntax, builtins, and libraries backported from Python 3. It aspires to be able to run all valid Python 2 and 3 code. In this article, I begin discussing how I was...

View Article

Nontrivial: Exception Handling in Python

Most code can fail in multiple places and for multiple reasons. Handling these failures seems pretty trivial, something you'd cover in the basic tutorial to your programming language. Actually, I think...

View Article


Logistic Regression Isn't Interpretable

Suppose two events A and B are independent, with the odds of A occurring being 4, and the odds of B being 5. What are the odds of both A and B occurring? I'll give you a hint: it's not 20.

View Article

The Communication Loss Function

My first ever task writing software professionally was to make some small change to the Kaggle server. I spent a day or so following painstakingly moving down the call stack from the API endpoint to...

View Article


Hypothesis Tests for Machine Learning

Statisticians have spent a lot of time attempting to do complicated inference for various machine learning models. In fact, there's an enormously simple and naive way to do this in complete generality:...

View Article

Style for Python Multiline If-Statements

PEP 8 gives a number of acceptable ways of handling multiple line if-statements in Python. But to be honest, most of the styles I've seen--even those that conform with the PEP--seem ugly and hard to...

View Article


Map Transformation

Note: I was describing my map transformation project to Sasha Trubetskoy recently, who is even more into maps than I am. This is a project I completed in March 2011 with Shir Yehoshua for a CS class,...

View Article

Why I've Liked Chess So Much

I've liked chess off and on for twenty years. I don't remember learning to play but I do remember an after school class and losing repeatedly to my dad and cousin when I was little. When I taught...

View Article

Gross Receipts Taxes as Payroll Taxes

A common critique of gross receipts taxes (where the government charges a small fixed percent of a business's total revenue) is that they have a disproportionate impact on low margin businesses. (For...

View Article
Browsing latest articles
Browse All 48 View Live




Latest Images