Wednesday, December 31, 2014

An Interesting Perspective towards Machine Reasoning

Recently, I came across an interesting article exploring potential directions for future machine learning research "From Machine Learning to Machine Reasoning". The main theme of this article involves a plausible definition of "reasoning": "algebraically manipulating previously acquired knowledge in order to answer a new question".

It is an interesting perspective as it explains how representation learning, transfer learning and multi-task learning could help construct practical machine learning systems for computer vision and natural language processing. Below depicts an example of training face recognition system in the paper:

Figure 1 in the paper "From Machine Learning to Machine Reasoning"

If we consider trained models (either for the underlying task or other related tasks) as previously acquired knowledge, then it actually advocates a sequential construction manner by re-using the representations (for sample or for category) obtained. In this sense, the recent Dark Knowledge and FitNets also share similar spirit in the realm of neural networks.

Saturday, December 20, 2014

The Secret of Bokeh

Bokeh, the technique of deliberately creating out-of-focus area in photos, has been widely used by photographers for emphasizing semantic subjects. These subjects could be either foreground or background of photos. An example of portrait bokeh is shown below:


Photograph by Di Liu

The effect of bokeh is usually introduced by particular lens in SLR cameras. However, camera in mobile devices has become more and more ubiquitous and also a common choice to record daily life nowadays. So the question arises that how can we design algorithms to generate the effect of bokeh in well-focused photo shot by mobile devices.

In fact, this functionality (creating semantically out-of-focus photo from well-focused photo) is a well-studied topic in computational photography. And a typical algorithm comprises two stages: depth map estimation and selective blurring.

We would like to obtain depth map of the underlying photo in the first place to distinguish between foreground and background. But estimating depth map from a single image has long been known as a difficult task. A practical solution is to leverage information in image sequence (burst mode in mobile devices) so that a multi-view stereo could be formed. Google's Lens Blur adopts this kind of practice and they had a research blog to elaborate detailed techniques. BTW, burst images could also be utilized for denoising. After that, we can perform adaptive per-pixel blurring according to the computed depth map. Or more advanced technique could be adopted to enable motion blur rendering. To make the final result more seamless and natural, the above rendering process could be done in a multi-scale manner, e.g. local Laplace pyramids. The same pipeline could also be applied to generate 3D effect image from burst images, as demonstrated in this website.

Face Triangulation for Graphic Design

Nowadays  low poly  has emerged as a popular element in graphic design, creating sculpture/crystal-like effects. Various movie posters util...