Artificial Life

I want to share a research problem I have been working on for a number of weeks. I created artificial lifeforms that live in a simple world, and gave them capacity to reproduce, recombine, and mutate. After I started getting some pretty fun results, I decided to build a JavaScript visualizer which you can use to run the simulations in your own browser.

I’m interested in creating social intellects with the capacity to autonomously generate models to handle novel situations, and with the ability to learn from observation. Both of these problems are difficult to describe a cost function for, so I’m instead leveraging life itself as the cost function and generating environments that will encourage development of the desired traits. Before I run off and make virtual dogs though, I thought it would be prudent to build some simpler life forms.

 

uckeleoid

Introducing the Uckeleoid (Pronounced like nucleoid without the ‘n’). The Uckeleoid has only a few pieces of information available to itself: Its gender, whether it’s facing another Uckeleoid, whether it’s facing food, whether it’s facing a wall, and how well fed it is.

I originally wrote this code to run in Java at many thousands of simulation ticks per second, but the JavaScript visualizer here (which is still running the full simulation math) simulates a single tick per second.

Uckeleoids live for 20,000 simulation ticks, or 5 and half hours at 1 tick per second. They have a gestation period of 2,000 simulation ticks (half an hour) and assuming restful behavior can survive without food for 2,000 simulation ticks (again half an hour). Red Uckeleoids are male and blue Uckeleoids are female.

I have saved some files snapshotting the development of the prototype Uckeleoids over the course of a few hundred generations. Let’s take a look at those now, you’re free to load each of these snapshots in your browser and observe the differences in behavior. It is worth noting that because the simulations are not deterministic and because the full simulation code will run in a browser, you could run these in your browser and achieve similar results if you left your computer on for long enough, but details would of course be different.

snap1   Tick 0 ( load snapshot ): The world starts with 87 Uckeleoids in a 25×25 world. Each Uckeleoid has an age of 0 to start, so the first deaths due to old age will be in 5 1/2 hours. Their neural networks are all initialized to give an equal chance of each action (rest, turn left, turn right, move forward, eat, and breed). The random initialization is not at equilibrium yet, and the population would grow to around 200 even if mutations were turned off.

snap2  Tick 18,000 ( load snapshot ): 5 hours into the simulation, 17 of the original 87 Uckeleoids are still alive. These remaining progenitors will die within half an hour if starvation doesn’t claim them first. The population has swelled to 200, despite very little change in the average neural networks of the Uckeleoids.

snap3  Tick 80,000 ( load snapshot ): After the simulation has been running for almost a full day, the oldest members of the population are roughly 15th generation. The population is beginning to show biases against turning away from food, resting, and … turning right. What? Yeah, the population has a consistent bias against turning right.

snap4  Tick 280,000 ( load snapshot ): Three days in and more than 50 generations, and the bias against turning right has become an ironclad rule. Uckeleoids simply don’t turn right anymore. Strangely, the population still hovers around 200. It seems that while individuals might be more fit, they haven’t tuned themselves enough to eek much extra efficiency out of the environment.

snap5  Tick 2,000,000 ( load snapshot ): Our last snapshot is slightly after 23 days. This is 100 Uckeleoid lifetimes, and the eldest members are approximately generation 350. Uckeleoids never rest, never turn right, and no longer attempt to eat when not facing food. The population holds stable at over 250 Uckeleoids.

Okay, what did we learn? I think my favorite piece about this is that the population will always develop a turning preference. Always. You can run this a hundred times, and each time a turning preference will be apparent by 300K ticks. I didn’t expect to see this behavior, but when you think about it, a Uckeleoid sees very little of the wide world around it, and remembers even less, so turning left and then right just means aging by two ticks. Turning left and then left again means aging by two ticks still but also potentially having new food / breeding partners / a new direction to travel ahead. For such a simple life form, indecision about which way to turn is wasteful, so the entire population will always develop a single minded turning preference. Individuals don’t exhibit non-standard turning preferences, because then their children will turn both ways, or worse, not turn.

Where to next? Code exists give Uckeleoids a few bits of memory, but the meta-analysis code isn’t powerful enough to show what if any impact these memory units are having. This means my next step is going to be to write additional data analysis tools, so I can demonstrate what if any impact a few bits of memory have. It’s quite possible that memory will not be helpful to the Uckeleoids without a more powerful set of sensors or a more stimulating world.

When all you have is a hammer…

Kaggle Competition: Titanic: Machine Learning from Disaster

When all you have is a hammer, everything looks like a nail. This was a lesson I thankfully got the very first time I sat down with a Kaggle competition data set. After downloading the data and uploading one of the sample answers as my first submission, I was surprised to see a few suspiciously high scores on the leader board. In addition to a couple of scores well over 90%, there were a total of four entries with a score of 100%. Because I was on Kaggle to learn and not to win tutorial competitions, the dubious scores didn’t linger too heavily on my mind. However, when I made my first serious submission, as I was stripping out the extra columns in the test data something caught my eye: a passenger’s name. Wilkes, Mrs. James (Ellen Needs). A quick Google search and sure enough, Mrs. Ellen Wilkes was a very real person, born on June 16th, 1864, and rescued aboard lifeboat number 16. There’s actually a whole wiki devoted to the RMS Titanic. http://www.encyclopedia-titanica.org/titanic-survivor/ellen-wilkes.html

Suddenly those 100% scores on Kaggle made much more sense. A few users had been clever enough to realize that there was no need to predict who would survive on an event roughly one hundred years prior. If you ever encountered this problem in the real world, the correct answer was undoubtedly to look the information up, not to build a clever classifier that would do complex feature extraction to bump from 75% to 80% accuracy.

There is an idiom that you are likely familiar with. Hindsight is 20/20. Normally it’s used to indicate that past choices are only obvious because an observer knows the results of the choice. In this case though, I might use it to indicate that one should not spend too much effort contemplating answered questions.

Programming Collective Intelligence: Building Smart Web 2.0 Applications

A little more than a month ago I found myself on an airplane to San Fransisco for the Fourth of July. Sitting next to me on the plane was a woman reading what at a glance looked like a printed-out research paper on classification methods. Already armed with a basic intuition of K-means and otherwise eager for more I asked about what she was reading. My initial hypothesis proved incorrect. She had printed out pages of a book which she highly recommended. Convinced, I set myself a reminder to order the book when I was back on the ground.

Medium: Book
Name: Programming Collective Intelligence: Building Smart Web 2.0 Applications
Author: Toby Segaran
Publisher: O’Reilly Media

Imagine my surprise when I found the book on Amazon only to realize I had bought a copy a couple of years prior. When I got back to Boston I started reading it on the way to and from the office. With just a few minutes here or there, I was able to read the whole thing in only about three weeks.

Collective Intelligence provides plain English explanations of common ML algorithms, as well as providing useful practical examples. It also remains approachable to readers lacking a strong background in collegiate mathematics or computer science. It’s very easy to recommend to individuals who really want to try coding a neural network without a proof in multivariate calculus showing that back propagation works.

The reason I didn’t read it two years ago hinges on the organization of the index. If you look for algorithms in the index, you could easily be lead to believe that these topics are not covered by this book. You will instead find the index is organized mostly by potential applications. Neural networks in particular were handled by chapter 4, Searching and Ranking. There is an algorithm summary at the back and the glossary can direct a reader to the relevant pages, but I would still suggest reading cover to cover for someone looking to get a foundation in ML algorithms.

I really enjoyed how simple it was to jump around without reading a previous section. At times the book would go into exercises I had little interest in spending my time implementing or would cover algorithms which I already understood quite well in both a practical and theoretical manner. In such cases, I found I could simplly skip to the next section and read along with no difficulty.

This book is great, but obviously it’s coverage is finite. In particular, there is no handling of the underlying math and theory for any of the algorithms presented. If you’re looking for a deeper understanding, you will need additional materials. I would still recommend this book as a getting-started point so you know where to apply additional effort and when to try different algorithms in your own adventures.

I would absolutely endorse this book to anyone seeking a quick read that provides high level intuitions about how various common algorithms work or anyone looking for some good tutorials on how to apply these techniques in a practical manner. On the other hand, anyone who already has a functional intuition of these algorithms will find the book redundant, and it should be obvious that anyone with a deep mathematical understanding has little to learn from this title.

Statement of Purpose

As long as humankind has known machines we’ve dreamed of alien intellects, relentlessly precise, flawlessly dispassionate, and a yet capable of complex induction. The earliest literary examples of automaton date back thousands of years, but these tales were always just tales. It has only been in the past hundred years that these dreams have gone from a whimsical fantasy to an honest scientific endeavor, and the literature has exploded in kind. Our collective imaginations have welcomed the likes of The Engine, Hal 9000, Holly, Data, J.A.R.V.I.S. and countless more.

I, like many computer scientists, have also dreamed of strong AI. The first program I wrote after the classic “Hello World” was a simple rock paper scissors game. In high school and college I wrote Othello AIs in my spare time. In the years since moving into industry, I have kept my eye on the state of AI in computers and dabbled in the current practices. However, this remote approach has ultimately proven insufficient to sate my curiosity. In response to this persistent intellectual wanderlust, I have undertaken a sabbatical to study these topics with my full attention. In the short term, it is my intention to become proficient in the algorithms, maths, and techniques used in Data Science and Machine Learning. In the long term, it is my intention to contribute to fundamentally changing how humans interact with machines.

In this blog I will be exploring a small range of topics. As a student of Data Science and Machine Learning I will endeavor to keep these experiences accessible yet informative. I intend to cover a range of fundamental Machine Learning Algorithms, giving a brief enough introduction to confer a basic intuition which can be used to establish a more rigorous mathematical understanding. I will dive deeper into many of these algorithms with coding projects. I also intend to discuss my own (mis)adventure in Data Science, especially but not limited to working with Kaggle competition data. Finally, I will do my best to collect, collate, and review materials I have found useful in my studies.

Follow

Get every new post delivered to your Inbox.