We recently caught up with Andrej Karpathy, Machine Learning PhD student at Stanford and the man behind the innovative ConvNetJS - a JS library for training Deep Learning models (mainly Neural Networks) entirely in your browser. We were keen to learn more about his background, the motivation and potential applications for ConvNetJS, and his research agenda ...
Hi Andrej, firstly thank you for the interview. Let's start with your background and how you became interested in Machine Learning and AI...
Q - What is your 30 second bio?
A - I was born in Slovakia and my family moved to Toronto when I was 15. I finished a Computer Science / Physics undergraduate degree at University of Toronto, went on to do Master's degree at University of British Columbia working on physically-simulated animation (think simulated robots), and finally ended up as a Computer Vision / Machine Learning PhD student at Stanford where I am currently a 3rd year student. Along the way I squeezed in two wonderful internships at Google Research working on neural nets (Google Brain) for video classification.
Q - How did you get interested in Machine Learning?
A - As an undergraduate I studied Computer Science/Physics and Math with intentions of working on Quantum Computing. I saw that computing applications were going to completely transform the world and I wanted to help create the most efficient computing devices. This meant going down as far as possible to quantum level and utilizing the most elementary physical laws and particles to perform computation. However, as I took my Quantum Mechanics classes it became apparent that I was not having fun. It was too distant, too limiting. I couldn't get my hands dirty.
At the same time, I felt myself consistently gravitating towards topics in Artificial Intelligence. As one of my influential turning points, I remember walking around in a library realizing that there were zillions of amazing books around me and that I wanted to learn everything in all the books and know everything there is to know. Unfortunately, I also realized that this would be, for all practical purposes, a hopeless endeavor: I'm a blob of soft tissue with finite, leaky memory, a slow CPU, and damnit I felt hungry. However, it also occurred to me that if I can't learn everything there is to know myself, maybe I could build something that could. I refocused on Artificial Intelligence, and later (after a period of confusion when I was asked to code up graph search algorithms and minimax trees in my AI class), narrowed in on the branch I felt was closest to AI: Machine Learning.
Compared to my Quantum Computing escapade, I finally felt that the only thing that stood between me and my goal was entirely my own ingenuity, not some expensive equipment or other externalities. Additionally, I realized that working on AI is arguably the most interesting problem because it's the ultimate meta problem: if I was successful in my quest, the AI could in principle learn all about anything, with Quantum Mechanics merely as a relatively insignificant special case.
Q - What was the first data set you remember working with? What did you do with it?
A - It was probably MNIST digit classification, hacking on Restricted Boltzmann Machines while auditing Geoff Hinton's Neural Nets class at the University of Toronto somewhere around 2007. But strangely, I don't remember the class having much impact on me at the time and in fact I remember being dismissive of it. I considered the digits classification to be a cute but useless toy problem and I couldn't understand why Geoff got so excited about digits. At the time I wasn't ready to extrapolate what I was seeing to my own motivating examples, including for example, reading and understanding the content of all books/websites/videos in the entire world.
Q - Was there a specific "aha" moment when you realized the power of data?
A - I think it was a gradual process. After I learned the mathematical formulation of regression and some of the related approaches I started to realize that many real-world problems reduced to it. Armed with logistic regression (but also more generally Machine Learning mindset) as a hammer, everything started to look like a nail.
Very interesting background and insights - thanks for sharing! Let's change gears and talk more about Neural Networks...
Q - What are the main types of problems now being addressed in the Neural Network space?
A - Well, first what we do know is that neural nets seem to work very well in fully-supervised regression problems (i.e. learning fixed mappings from input to output) when you have a Lot of data and sufficient computational budget. In fact, I would argue that we've learned something more profound: neural nets are in their basic form just non-linear regression and the more general lesson that has emerged is that you can, in fact, get away with formulating scary-looking non-convex objective functions and that it is seemingly possible to "solve" them in real-world scenarios, even with simple first order methods. This is not an obvious conclusion and for a long time many people were worried about local minima, lack of theoretical guarantees, etc. However, the field has recently seen an explosion of renewed interest, in large part due to dramatic improvements obtained on important, large-scale problems in the industry (speech and vision being among first few).
There are plenty of open questions and exciting directions. I'll list just a few: We haven't really figured out very good ways of taking advantage of unsupervised data (most serious industry applications right now are large, fully supervised neural nets with a huge amount of labeled training data). We still don't have a very satisfying principled approach for optimizing neural nets that is in practice consistently better than a few tricks and rules of thumb that have been developed over the years largely by the "guess-and-check" method (and it is common to rely on vigorous hand motions rather than mathematical theorems to justify these tricks). Neural nets are also in their basic form fixed, feed-forward functions of data and many people are interested in composing them (for example as seen in recursive neural networks), using them in structured prediction tasks, reinforcement learning tasks, or perhaps most interestingly formulating loopy models and training neural nets as dynamical systems (recurrent neural networks). I also think we'll see interesting multi-task learning approaches for embedding different types of data into common "semantic" vector spaces (especially words, n-grams or entire sentences, which is currently a very active and relatively new area of research), techniques for mapping between modalities, etc. It is an exciting time to be in the field!
Q - Who are the big thought leaders? [you can say yourself :)]
A - Haha, maybe if you ask me in 20 years I could list myself, but for now I'll go with a largely uncontroversial answer: Geoff Hinton, Yann LeCun and Yoshua Bengio.
Q - What excites you most about working with Neural Networks?
A - Neural Networks enjoy many desirable properties! Just to list a few: They can be trained online, they are efficient at test time (in both space and time), they are modular (the gradient is possible to derive locally and decomposes very simply through chain rule), they are simple, and they work. There are also plenty of interesting connections (pun not intended) to neuroscience and the human brain.
Q - What are the biggest areas of opportunity / questions you would like to tackle?
A - On an application-agnostic side of Neural Networks, I'm always more interested in simplifying than complicating, which is likely attributable to my physics background where all fundamental equations are sweet, beautiful and short. If the core of a (well-designed) Neural Networks library is longer than few hundred lines of code, it means there's still more unnecessary complexity and options to get rid of. The models we use are already fairly simple (nets with Rectified Linear Unit activation functions really just come down to repeated matrix multiplication and thresholding at zero), but the training protocol, hyper-parameter choices, regularization and data preprocessing/augmentation tricks are infamous for being a messy dark art that requires expensive cross-validations.
Sounds like a very interesting time to be in the field, with lots of areas to explore - look forward to keeping up with your work! On that note, let's talk more about your recent project creating a JS library for Deep Learning models entirely in a browser...
Q - Firstly, what motivated you to create it?
A - I noticed that there were many people interested in Deep Learning but there was no easy way to explore the topic, get your feet wet, or obtain high-level intuitions about how the models work. Normally you have to install all kinds of software, packages, libraries, compile them for your system, and once (and if) you get it all running you can usually look forward to a console window that spams the current average loss across the screen and you get to watch the loss decrease over time. The whole experience from user's point is dreadful. I wanted to make the models more accessible, transparent, and easy to play with and I figured that browser is the perfect environment.
Q - How did you build it? How does it work?
A - It started off as a fun hack over my christmas break in Toronto. I've already implemented Support Vector Machines and Random Forests in Javascript and as I sat in a Tim Hortons one day sipping delicious Canadian coffee I decided it was a good time to write up a Deep Learning library. My initial intuition was that the (convolutional) networks would be too slow but I ran some quick benchmarks and was blown away by the efficiency I was observing in Chrome with the V8 engine. That encouraged me to continue the development. Along the way I was also labeled as "insane" for working on this project by a good friend of mine, which only made me further step up my efforts - when people call you insane you know you've hit on something interesting!
Q - What Machine Learning processes, models, techniques etc. does it enable?
A - The library allows you to specify and train neural networks. It also supports convolutional neural networks which are designed specifically for images. Images are technically very high-dimensional input spaces but also possess certain simplifying properties that are explicitly taken advantage of in regularizing the network through restricted connectivity (local filters), parameter sharing (convolutions), and special local invariance-building neurons (max pooling).
Q - What is your favorite demo application, and why?
A - I haven't fully finished my favorite demo yet. It is a 60-million parameter convolutional network that is trained on ImageNet (the largest image recognition dataset) using GPUs and then ported to ConvNetJS through JSON. The issue right now is that the model is about 200MB in raw number of bytes and several gigs in JSON format so it's a little hard to work with. I'm working on quantizing and compressing the network in various ways so that it fits into a few tens of megabytes and still delivers state of the art image recognition performance, but in your browser. I don't consider this to be exceptionally useful but I think it would go a long way in demonstrating what's possible in Javascript, today.
Q - What was the most surprising result / learning?
A - Javascript's efficiency was the most surprising result and I only expect the trend to continue. We're starting to see a lot of low-level support built into browsers (for example with WebGL) and once there is support for efficient matrix multiplication (a necessary, core building block of not only Neural Networks but almost all Machine Learning algorithms), it will enable a myriad of highly-accessible applications. I'm keeping my fingers crossed - if that happens I'll be able to change a few lines of code and expect to see at least an order of magnitude increase in efficiency.
Q - Where could this functionality be applied? What potential application are you most excited about?
A - First, I think there are obvious educational applications since it is so easy to set up, train, visualize and tune networks. You can quickly build intuitions for different variables (such as the learning rate, or weight decay) if you fiddle with a slider and see the effect on the accuracy or the features right away.
There are some potentially interesting uses for training Neural Networks in browser extensions (modeling some aspects of user behavior, or content of sites they visit), uses in web-based games, or from a site owner's perspective immediate user modeling based on site interaction to deliver a customized experience. Since it's entirely pure Javascript, it's also instantly available on node.js and available for use on server side.
The library could also be used in larger and more serious applications in a hybrid setup where the expensive training is done on an efficient backend and the network weights are loaded through JSON for a reasonably quick test-time prediction in Javascript. Alternatively, it could serve as nice browser-based visualization frontend that interacts with a C++ library that potentially trains a network on GPUs. This is how I'm currently using a fork of the library in my own research, as the C++ code I'm working with only scrolls numbers in a console and I have a preference for looking at pretty pictures of learned weights, neat d3js loss/accuracy curves and example predictions to monitor the progress of my network while it trains. It's also easier to click a button to anneal the learning rate without having to enter commands in the console.
Lastly, here's a crazy idea: massively distributed Neural Network training (think FoldIt, or SETI@Home), except every client merely visits a URL and right away starts to contribute Javascript compute time by sending gradient updates to a central server. A few issues have to be addressed first in terms of the modeling: vanilla Neural Networks have dense interactions so they are difficult to parallelize and naive use of distributed optimization techniques is likely to pose problems with stale gradients.
It really is a terrific resource, with many applications - the ML community should be very grateful that you developed it!
Q - Now, in terms of your more formal research, what are you working on day-to-day??
A - As I alluded to above, my preferred application domain for Machine Learning is Computer Vision. The vast majority of the content on the internet (by both size in bytes and attention) is visual media content, yet we have almost no idea what's in it! In a sense, images and videos are the dark matter of the internet. I also like that analogy because that makes me an Internet Astronomer.
In terms of my general approach, my motto has always been "I like my data large, my algorithms simple and my labels weak". I'm seeking to develop algorithms that gobble up all the images/videos on the internet and learn about the visual world automatically with very few human-provided annotations. With the current state of the art methods if you want your algorithm to recognize a sheep in an image you have to first provide it hundreds, thousands (the more the better) of examples before it can do so reliably. This low-hanging-fruit approach turns out to work well and is essentially how all current industrial applications work, but philosophically it is revolting: your parents didn't have to show you thousands of images of sheep in all possible angles, poses, occlusions and illumination conditions before you could recognize one. They pointed it out once as a "sheep" and you associated the sound to that already-familiar visual concept. The learning was instantaneous and did not require thousands of examples. I work on developing learning algorithms that can do the same, provided that they have seen and processed millions/billions of unlabeled images first.
So far I've only run at most the first mile of the marathon that I expect the above task to be. A representative paper that is relevant to my current interests at Stanford is my NIPS paper with Adam Coates on training an unsupervised deep learning model on millions of images from YouTube and automatically discovering frequently recurring visual concepts. The paper is very similar in spirit to the famous Google network discovered cats paper from Le et al [ICML 2012]. In retrospect I think there were many things wrong with the modeling and the approach but I'm hoping to address all that in the coming years!
Definitely a fascinating area to explore - good luck with the next few miles of the marathon! Finally, it is advice time...
Q - What does the future of Machine Learning look like?
A - Broad question! In terms of research I can confidently say that we are going to see a lot of rapid progress in Neural Networks areas I outlined above. There are other areas which I consider promising but I know relatively little about, such as Bayesian Optimization and Probabilistic Programming. In terms of applications, I'm convinced that the future of Machine Learning looks very bright and that we will see it become ubiquitous. The skill to manipulate/analyze and visualize data is a superpower today, but it will be a necessity tomorrow.
Lastly, I expect that Machine Learning in Javascript will also become ubiquitous due to its wide availability, accessibility, the benefits of the browser as a wonderful, powerful, efficient and interactive UI framework and my hunch that the vast majority of machine learning applications are actually only of medium size and easily and immediately crushed with Javascript (on desktop/mobile browser, or node.js server) without a need for complicated backends, pipelines or communication protocols. I expect we should see a very successful and widely used Machine Learning library for Javascript within a few years. Feel free to take this with a grain of salt though, since I have a history as a notorious fan of web-based technologies.
Q - Any words of wisdom for Machine Learning students or practitioners starting out?
A - You learn the most by reinventing the wheel. Don't just read about Machine Learning algorithms and fall into trap of thinking you understand the concepts because everything you read sounds reasonable. Read it once and then re-implement it from scratch, yourself. And while you're at it, do it in Javascript ;)
Andrej - Thank you so much for your time! Really enjoyed learning more about your background and what you are working on now - both your personal projects and more formal research agenda. Andrej's blog can be found online at http://karpathy.ca/myblog and he is on twitter @karpathy.
Readers, thanks for joining us!
P.S. If you enjoyed this interview and want to learn more about
- what it takes to become a data scientist
- what skills do I need
- what type of work is currently being done in the field
then check out Data Scientists at Work - a collection of 16 interviews with some the world's most influential and innovative data scientists, who each address all the above and more! :)
from: http://www.datascienceweekly.org/data-scientist-interviews/training-deep-learning-models-browser-andrej-karpathy-interview