Recipes as Searching the Space of Algorithms

When I was leafing through the pages of a cooking recipes book yesterday, finally it hit me why I love recipes: They are algorithms which are robust to small variations and mistakes, and more often than not, they benefit from them. They have further nice properties: They are empirical algorithms, obtained without a theory, but rather obtained by searching through the space of algorithms. We have no idea why a particular recipe tastes in the particular way it does, we have no theory for it, and there is no recipe derived from a theory, rather some people recently started trying to figure out theory from the empirical recipes transmitted to us from our ancestors.

cherry.jpeg

This coincides with a theme I loved at NIPS 2016. Normally, in machine learning, we derive algorithms based on theories: We have mathematical models and variables about which we want to learn. Then, in order to estimate them, we derive algorithms using the mathematical model (theory). But if, in practice, the only real entity is the algorithm but not the model, why bother with the model in the first place, if we can search in the space of algorithms? Recipes are just fascinating examples of this attitude. Instead of derived from The Scientific Theory of Food, Taste and Molecular Chemistry by a Professor in the Department of Molecular Food Chemistry, recipes are extremely effective heuristics developed through trial-and-error, which have dominated the theory by presenting us with tastes that we wouldn’t be able to find with developing models of chemicals. Anybody who enjoys the taste of food wouldn’t care about the underlying chemistry (I don’t!) because as soon as you have the recipe, the chemistry is an additional curiosity, something you “would love to learn” if you have no other thrills in your life (but trial and error in the kitchen is far more enjoyful than food chemistry — sorry). Interestingly enough, this idea started to be taken seriously in the ML community. Instead of developing poor and inexpressive mathematical models with our poor imagination, why not learn “the code” or “the algorithm” directly, with the expense of having an inexplicable, black-box model as a result? As soon as this satisfies us in terms of solving the problem, why would we care about the underlying complex and inexplicable mathematical model?

Before leaving the further interested with some relevant links, I would like to note that this kind of approach led to a discussion about interpretability. People are annoyed by this kind of thinking because at the end you have an algorithm which works but you have no idea why it works — well, exactly like recipes. I have also discussed this in length with a friend of mine at NIPS: I think interpretability is not one of the goals we need. Any interpretation is likely to be a deception for any human being, interpretations are for the people who love niceness, beautifulness, nice interpretations, and similar constructs. I don’t. But I do not want to elaborate this point with more psychology — at least not now.

Here are some links about algorithms that can learn algorithms — you can browse further through the connections implied by these links.

Neural Abstract Machines & Program Induction

Recurrent Neural Networks and Other Machines That Learn Algorithms

Deep Reinforcement Learning Workshop

Bayesian Deep Learning

Statistics is not an end, it is the beginning

The story is now old: We, humans, see random patterns given a sequence of numbers or when we observe some sequence of events in chains, we tend to think of correlations as causations and draw nonexistent inferences, we overtheorize. The evolutionary advantage of this instinct is obvious, I will not elaborate. Why is this called so much as a bias nowadays, then? The reason is because we are living under the “big data” regime, a quite different age than what our ancestors lived in; an age of noise explosion and explosive noise; an age where winner takes all and actually produces the winners who can take all. Everyday we are bombarded with data with so few actually working (statistical and heuristic) models to handle it.

If you think it seriously, the thing is pretty serious. It is directly relevant and central to our daily lives. Humans see random patterns in a sequence of numbers. You don’t have to think some misinterpreted government statistics or p-babble in a psychology experiment or a scientific paper which misinterprets it all. If it were limited to those bureaucratic issues, I wouldn’t call it as central. But it is central.

The real insight is that we have become cyborgs. We don’t have chips in our brains at the moment but technically we are not very different. Our email boxes, Facebook profiles, WhatsApp usage; they are all electronic extensions of us and we are now more than a mere collection of tissues; we live in a data flow, in the middle of numbers. As another example, I usually outsource some of the thinking to the computer, plotting the functions instead of imagining them, use computer actively to extend my reasoning.

Inside this data flow, being fooled by numbers is a huge problem. For example, I am sure if you are someone who, at some point, was obsessed with another human being and you stalked him/her over various social media platforms to get information, you are getting the point. Probably, in hindsight, you found so many spurious correlations, ridiculous fittings to some shared ridiculous links and drew inferences based on them. Mostly you made stupid conclusions and this caused you to miss the relevant information. In this age of information, we are bombarded with data, yes, but not only we don’t know how to draw inferences from data, we are also not aware of the fact that we don’t usually see the most relevant data. In short, we use our regressors to fit noise.

If you don’t think clearly, you will misunderstand everything. Let me open a bracket: If everybody misunderstands everything, why don’t everyone mess up? After all, if you make decisions with your stupid understanding, you should mess up at some point. But fools are doing pretty well. The thing is that the personal trajectory of a person very much depends on luck and more on the optionality coming from the family; so much that personal idiotic inferences are not relevant. That is, we are robust to our own stupidity as the society figured out heuristics for that long ago. However, this doesn’t mean that it is OK to not know statistics, because (thankfully) things are becoming weirder and weirder and the world will look more volatile and dominated by explosive noise in the future than usual. And thinking probabilistically is essential if you don’t want to devote your life to some fluctuation in the history (and believe me this can happen — it was not possible 10 years ago but it is becoming more and more probable).

We really need statistical reasoning.

Lately, I have been trying to improve myself on developing some actual systematic thinking against this noise. When looking at news, in my personal life, deciding what to research, watching financial bumps. In this mess, you need to fortify your mind with probabilistic thinking. The more I tried to actually use statistical reasoning in my life, the more I realised that it can be best used negatively. Actually, this idea, I knew from the beginning, took a lot of time to really sink in to my mind. This is why the following tweet was the end of the process and made it clear.

Exactly! Taleb nails it once again! This is why it is mostly desperate to use statistical methodology in a positive way, to claim some relationships. It can be best used to refute the relationships, to “denoise”, to not be fooled by randomness.

Let me conclude. I love discoveries and this is why I wanted to make science a career. But it seems to me that, the real discovery is not statistical, it will not come solely from the data — though you need rigorous probability & statistics in the process in order not to be stupid. Discovery is based on dumb luck and being in the field — whatever that field you want be in (Hint: choose “low variance” fields to get lucky, if you know what I mean). Discovery is based on excessive tinkering, trial-and-error, testing, experimenting and rejecting spurious hypotheses while keeping the more sensible ones and assigning them probabilities. This is why, I would first want to learn the statistical methodology by heart, then quit the field of computational statistics. What would I do next? I can’t say. Now I am trained to think probabilistically with all my atoms, after all.

We need exponential forgetting in the world

There are some problems about the world which make me sick everyday. I wish to avoid them and stop thinking about them but they’re coming back. Writing them up is always helpful and I guess will be always helpful so let’s discuss one of them here.

Recently, I was in Barcelona and when we were sitting in a cafe with friends, we started discussing social inequality. The topic came from Piketty’s last book, as my friend has delved into it by taking lots of notes. At some point, I had been asked about my definition of an equal world. My answer was predictable in my standards: I don’t believe everyone can be equal deterministically but we should be equal in probability.

WFP Operations in Homs

Let’s think mathematically. I like to consider each individual as a Markov chain and I like to denote people with X_k^p with k being time for each individual p. This random variable, let’s say for the time being, shows how “good” you are. Also let \pi_k^p denote the distribution of the well-being of the individual at time k.

Everybody has an initial condition X_0^p. We are born in certain countries to certain amount of wealth and in socially stable or unstable populations. In Markov chains, it is customary to assume this also random so we have an initial distribution of the Markov chain \pi_0^p . Technically, the problem makes me sick everyday, harms me, bothers me and gives me a big unrest is the following: Given this initial distribution for any individual p, probability distribution at any time k, I mean \pi_k^p , is extremely dependent to \pi_0^p in this world. We are very sensitive and dependent to initial conditions: Wealth of your family, your nationality, your first language. In a fair world, we would have an exponential forgetting property (defined for hidden Markov chains). Which means, given two different initial distributions \pi_0 and \pi_0^\prime and observations from the world y_{1:k} , difference between distributions depending on these initial distributions at time k (respectively \pi_k and \pi_k') should exponentially decrease with time. In other words, \|\pi_k - \pi_k'\|_{TV} \to 0 should be happening exponentially fast!

Intuitively, this means the following: Given any two initial distributions, no matter how bad they are, you should converge to the correct distribution you should have at time k. We are Markov chains, so we depend on our past but it should not be too much and should decrease with time. To be fair, we have some amount of this property in the world, the situation is not the worst if you look at the history. But we’re still far from the perfect.

And then, of course, we shifted to the discussion on how to achieve this. My solutions are never based on using some government force to enforce this, I am strongly against using any central force to achieve anything. Instead, I shared my recipes with friends, as only real friends will be entitled to hear them, I guess, for a long time.

How to make world more stochastic, more insensitive to initial conditions, so we can ensure that everybody has a chance to thrive, not only the ones who are born in rich countries and can spend ridiculous amount of money for mostly useless education? Answer it for yourself. But in any case, I would advise you to get ready for the stochastic future, because I’m pretty sure, we’ll do, at least, something about it.

Decisions in the face of an enemy

Whenever we decide to play basket, our plans turn out to be naive and we usually end up with being not able to playing it. In winter, it was usually rain. One week we decided to play on Thursday (we decided on Monday) and I decided to go Turkey on Tuesday! One week I became sick and for the last week it started to rain out of nothing: No forecast! Some friends think that the real problem is me, the Goddess of Luck is pissed off because of me. It is not improbable, though, let’s see what we should do when Gods mess up with us.

The problem is similar to what some machine learning people have in something called “adversarial settings”. You have some possible actions a_t \in \mathcal{A}  for time t  and you have some adversary which takes an action z_t  against you. The loss you’ll observe will be a function of (a_t,z_t)  pair. In other words, you take an action and you face with an adverse action against you and depending on your action, you’ll have some loss or reward y_t = f(a_t,z_t) .

What machine learning people correctly identified is the following: In an adversarial setting, i.e. when someone really tries to mess up with you (Gods, here, or the city itself in Istanbul or London), any deterministic policy (a_t, t \geq 1)  will result in a disaster. Why? Because in this case, your enemy can choose the worst possible outcome sequence z_t  for you since she knows what you’ll be doing at a given time. In order to avoid the enemy, you should randomise your policy a_t and simply be random (but in a controlled manner). It is exponentially harder to prevent some random event than prevent some deterministic event. This layer of uncertainty will prevent your enemy to guess what you’ll be exactly doing hence will significantly limit your losses.

Even Gods have limited powers. See, it is easy to make it rain when you know the exact time these guys will be playing. But when we include a layer of uncertainty over the days we’ll be playing basket, no God can make it rain for 3 days in the midst of the summer. What we should do is to decide to play at multiple times and nobody should really know the actual time since one can not hide information from the Goddess of Luck. The best policy here is probably simply to go and play at some random available time which is to be decided opportunistically (preferably when the weather is not over 35C!).

It is not easy to mess up with the Goddess of Luck. But since she is obsessed with me to some extent, I’m learning to strike back the hard way!

Humans benefit from cognitive errors

We are seeing a huge literature on cognitive biases & errors. Most of them are interested in showing & representing biases — and a a few of them are about avoiding it. When we hear about cognitive biases, we directly think how to avoid them (without even realising it). However, Kahneman for example, in his book, states there is a little hope for it. And I think there is a reason for it.

Recently, it hit me that avoiding cognitive errors is itself an error! Yes, every one of us fits a narrative to his/her past, recollect events in a certain way (by forgetting the forgettable) in order to construct a narrative. And almost all of the stories are wrong; we choose stories, so to speak, based on our imagined future. The reality, at least personal reality, is a cultivated fiction.

I started thinking that, keeping this in mind & trying to avoid this process is a form of fallacy. Because cognitive errors are what let us thrive in the world and enable us to move on, rather than being stuck in a particular sequence of past events. Cognitive errors are similar to statistical model errors and humans have it because they benefit from them (unlike statistical models). Having an erroneous narrative doesn’t hurt anyone as long as they become better thanks to it.

There is, clearly, a compromise problem. Most of the people don’t care at all about how they reconstruct their stories in order to avoid everything which give pain to them. Consequently, they can verify almost every type of action (as soon as they are compatible with the norms of society) and it is no different than being an evil on purpose.

Based on this compromise problem, here is a thought. Somebody should put down a framework which gives us heuristics — which enable us to benefit from our cognitive errors while staying ethical. Should I write a book on this?

Narratives

Every now and then, we see in the science news feed that researchers did so and so, thus in the near future, we will be able to remove or generate our memories. It is claimed that by being able to do so, we will remove unhappy memories, hence we will be full of happy memories which supposedly will make us happier.

People usually find these stories terrifying with the feeling that they will not be human anymore if their memory is not intact.

But it seems that nobody is aware of the frustrating fact that most of the humans already do this quite regularly.

Life is mostly about what we want to remember (and keep up) from our past. The cognitive mind is very skilful at only remembering memories which fit to our recent narrative. We narrate something first, we want to believe in it (or environment forces us to do so), and then our cognitive mind starts to filter out all unnecessary memories which conflict with the narrative. Likewise, it recalls all related memories which are now useful, and even if their relative power is not too much to support the recent narrative, it magnifies them, makes them more emotional and way more powerful than they are.

As Harari nicely points out, humans are very very successful about believing in a myth and the corresponding narrative as the rest follows by the cognitive mind. What is more powerful is, humans are just able to switch between narratives since the cognitive mind will do its job to filter out irrelevant past experiences in order to construct the future.

The only possible glitch that I could not understand yet, it seems to me that there is some layer of reality which we can not avoid. We discuss removing memories in science news because sometimes we are not able to forget them as there is a layer of reality which the cognitive mind is not able to avoid. Here is my explanation: In actuality, you never forget your memories which don’t fit to your recent narrative, but you just choose not to remember them. As life takes off, it brings original problems and situations which force you to remember what you don’t want to remember. This is the unsurpassable, unavoidable complicatedness of life: You never know what the fortune lady brings. So no matter how much you try to believe in your new narrative, there is a decent chance that life will filter out what doesn’t fit to reality. If your narrative is not grounded on powerful facts, it may collapse. This is the part we can not just simulate as we can not create facts. As the point beyond this (creating facts) is just classified as a cognitive disease, I cease the discussion here.

The Problem of Deduction

Hume defined the problem of induction which applies to our knowledge and inductions we can make from our mere observations. So far, little cure is available for the problem of induction: All statistical methods are, still, powerless in the face of a really complicated inverse problem. We will solve it but not today.

Today it hit me that, a similar problem, the problem of deduction as I will call it, applies to human condition and social sciences as the problem of induction applies to natural sciences. I am sure there are epistemologists who have already discussed this but I am keeping this post self-contained, I will study connections later.

Here is the problem of deduction: When we want to decide and change something for social matters, for another person’s feelings, or to solve a social problem, we usually find thinking useful. Thinking, here, means that you deduce something about some complicated social situation using your reason and some cause-and-effect assumptions. Although thinking, in this business, is particularly useless because deductions are mostly bullshit. You can think that person X feels depressed because of reason Y and hence deduce that the cure will be saying or taking action Z. However, this deduction is purely bullshit – I mean the literal meaning of the word – because it is not possible to deduce anything in these matters. Nobody, for sure, can say anything deductive about how we feel and how we will feel in the face of a new event. Therefore any action based on these deductions and rational thinking will be baseless and they will mostly result in ridiculous situations. The complexity of social life and human condition eliminates any kind of rational reasoning from the table.

Humankind, desperately, tried to find a solution to problem. You can say, on sight, that it is not solved just by looking at the variety of psychological treatments: For example, cognitive psychiatry has its own methods of deduction and cause-and-effect theory in social matters whereas psychoanalysts developed completely different methods of deduction. A psychopharmacologist will approach in a completely different way. All other types of psychological treatments also use particular kinds of cause-and-effect assumptions, and they try to have a rational guide to deduce “realities” about humans.

If there is this problem of deduction, how do we survive, and do this good then? I wouldn’t say we are doing very good about each other, if you look at how we treat each other. We are very bad at understanding feelings of the people around and this causes too much trouble. But, after all, we are not that bad. Yes, we are not truly powerless; and this is not as complicated as the problem of induction. Because we, humans, are very good at perceiving signals from other humans, deep signals. Because evolution has trained us over millions of years, empirically. We are very good at recognising empirical regularities and our feelings guide us fairly well in social situations. Do not contaminate it with too much thinking, rely on empirical regularities, and you will have a partial solution to problem of deduction. Unlike the problem of induction where our intuitions fail spectacularly.