二次啟航: 2015

Sunday, December 20, 2015

首次AGU纪录

临行前两天 12/10

上周五参加的poster workshop相当有用。Scientific poster的核心是传递内容，所以follow a template不仅节省时间，而且更加准确有效。

早上早早来到办公室，使用哪个不需要校正的wind function做trend attribution时，整个人都傻掉了——之前并不知道（a）不同的Penman Equation都可以做attribution，（b）Donohue的方法并不完备。好在之前代码都有，重新改了函数还是知道自己的改动是否产生正面的影响。这个生日真是记忆深刻，和办公室所有人奋战到晚上1点多，把所有的图都弄好了，之后就是填补poster了。

Time Series: Fundamental Concepts

Before we talking about the ARMA models (frequently seen in economic literatures), let's begin with the basic concepts where all these time series models are built. Things being reviewed include autocovariance, autocorrelation, partial autocorrelation, moving average, and autoregressive representations of time series. I will also give brief introduction on stationarity and ergodicity.

A stochastic process is a family of random variables that describes the evolution through time of some process. We can see stochastic process as a process described by statistical properties, opposed to a deterministic process such as sin(t). A stochastic time series is a realization from a certain stochastic process.

completely stationarity: any time series distribution function is time-invariant. Why should we worry about stationarity? We need to remove stationarity before we do most of the time series analysis, so we have a lot of preprocessing.

Sunday, November 15, 2015

科研与攀岩

以往攀岩的时候，全然没有联想过攀岩和科研的关系。今日从师姐那里听到了科研即攀岩的比方，竟感觉极其恰当，极其精妙。

学生时代，我们要完成作业和课程报告，这就像攀爬一条给定难度的线路，旨在通过练习不断提高能力。
想起我最开始，脚尖连岩点都站不住，两腿瑟瑟发抖，甚至连一级难度都无法完成。看着眼前岩壁这个庞然大物，心想学会攀岩是不可能的事！不过几节课下来，掌握了一些基本要领，比如重心移动、手脚顺序等，然后肢体的力量也上去了，慢慢地，一级二级就自然而然地度过了。
然而，师傅领进门，修行靠个人。攀岩过了基础关，后面的提升是缓慢的。这时候会产生很多疑问，到底是多抱石，还是多向上爬，是多练习力量，还是技巧？在大学里生存，也是培养自学能力的过程，每个人走的路都很不一样。

科研则是自由攀爬一面特别随机的壁。困难分布得如此不均匀，以致于进展又快又好的时候，你根本猜不到下一步就是精神的（肌肉的）。。崩溃（撕裂）。。。
彼时我的核心力量很弱，跨度大的岩点，我是没法过的。这种上不了又不甘下去的感觉很痛苦，不仅是因为遇到了瓶颈，还因为在岩壁上以一个姿势挂着肌肉吃不消，两三分钟后不下去都不行。有时候科研的坑挖了好几年，逐渐忘记自己为什么要挖这一个坑，甚至开始质疑挖坑的意义。耗到最后，累觉不爱。
有的时候，下一块岩点就在眼前，但因为能力和身体素质有限够不着……这个时候，你会放弃这一次攀爬，到健身房练器械吗？可是，你去了健身房之后，可能发现要练肌肉也是很难的，结果就是攀岩放弃了，健身也失败了。其实以前也常有这个想法——自己要是学多点数学物理就好了，书到用时方恨少。但如今想想，如果当初每次都能尽力地完成每一个路线，就算多走些弯路，估计会增强不少自信吧！而且能力也会在克服这些困难中提升的。既吃不了健身房的苦，又不肯耐心地多走弯路，大概也是我科研的问题所在。
不过科研的大部分时候，我们甚至不知道目标在哪。这时往往怀念起导师给我布置的固定线路了。如果在爬之前，退几步，站远一些，仔细观察岩点的特点，对大致的方向心中有数，也许就不会脚踩西瓜皮，滑到哪儿算哪儿了。

而生活，是真正的野外。大自然比室内困难艰辛一万倍，却也最是吸引人。
愿我对生活的热忱永不灭。

Tuesday, November 3, 2015

Standard deviation vs. standard error of the mean

When I calculated STD using software for two different samples, I got confused as I got a higher STD for smaller-size sample. I expect the STD will become greater for larger sample, because you will have a larger spread.

Then I realized I've mixed up the population and sample again.

1. The standard deviation is a measurement of the "spread" of your data.

The standard deviation does not become lower when the number of measurements grows.

Note that both mean and standard deviation are population properties.

2. The standard error of the mean is the standard deviation of your estimate of the mean.

The standard error of the mean (i.e., the precision of your estimate of the mean) does get smaller as sample size increases.

If you take more measurements, you are getting a more accurate picture of the spread, which means the as you increase your number of observations you will on average get more precise estimates from your sample for both the population mean and standard deviation.

Wednesday, October 28, 2015

[R]Reset the environment variable for R in El Capitan

The issue is I can't call R in terminal after El Capitan has been installed. This is the result of the newly banned writing permission to usr/bin. See this post: R in El Capitan public beta (new security model).

1. Find the R binary.
I have the Rstudio, where I can open R console. In Rstudio, use the following command to track where R binary is or RHOME.
If you can't track your directory, you can also look for R in this path. My system is El Capitan.

2. Modify the path profile.
Add the directory to your system path. In terminal:
$ sudo nano /etc/paths
Enter your password, when prompted.
Go to the bottom of the file, and enter the path you found in the first step.
Hit control-x to quit.
Enter “Y” to save the modified buffer.
Hit enter t confirm when prompted by "File name to write".

Major reference:
Add to the PATH on Mac OS X 10.8 Mountain Lion
Set environment variables on Mac OS X Lion

Sunday, October 25, 2015

Quantile

Many measures of the empirical distribution rely on quantiles. According to Wiki, quantiles are cut points dividing the distributions of the observations into equal sized groups. We can understand this concept easily by depicting the cut points on the probability distribution of the data. A sample quantile Qp is a value having a same unit as the data, which exceeds p (0<p<1) proportion of the data, or can be regarded as the p*100% percentile of the data. For example, median is also expressed as Q0.5, meaning the data point which exceeds 50% of the data.
The determination of quantiles requires the order statistics of the data. One of the definition is copied from order statistics:
Our goal is to find the value that is the fraction

p of the way through the (ordered) data set. We define the rank of the value that we are looking for as

(n−1)p+1. Note that the rank is a linear function of

p, and that the rank is 1 when

p=0 and

n when

p=1. But of course, the rank will not be an integer in general, so we let

k=⌊(n−1)p+1⌋, the integer part of the desired rank, and we let

t=[(n−1)p+1]−k, the fractional part of the desired rank. Thus,

(n−1)p+1=k+t where

k∈{1,2,…,n} and

t∈[0,1). So, using linear interpolation, we define the sample quantile of order

p to be

x[p]=x(k)+t[x(k+1)−x(k)]=(1−t)x(k)+tx(k+1)

However, this is only one of the nine ways to compute quantiles and not even the best one, R7 in Wikipedia. This is the result of computational load in the past (see this article). This article also discussed the best estimate method, R8 in Wikipedia. This is connected to Tukey plotting position formula through CDF, discussed later.
Sometime we want to compare two distributions, for example, we want to see if two empirical distributions have the common features, or would like to know if one empirical distribution can be fitted by a theoretical distribution. Histogram and ECDF have been widely used for fitting a theoretical distribution, while the results heavily rely on the bin width. Quantile-quantile plot is a more robust way to do the comparison. qq-plot is a scatterplot, with each coordinate pair defining the location of a point consists of a data value, and the corresponding estimate for that data value derived from the quantile function of the fitted distribution. Note that quantile function is the inverse of the cumulative distribution function, therefore the methods for plotting position for CDF is the inverse methods for quantiles estimation. This article about qq_plot is a very detailed and clear online material for understanding the basis of CDF matching.

Friday, October 16, 2015

[Linux] Search all files containing a given pattern

$ grep -rnw 'directory' -e "pattern"

Within the directory, you can just ignore the 'directory' option.

Wednesday, October 7, 2015

第三年了

只能说，随着general过去，我的动力和干劲并没有上来。难道真是暑假后遗症吗？

Wednesday, September 2, 2015

控制情绪的机制

一直以来，我对身心关系、知行关系感到非常好奇。

按照柏拉图的灵魂理论，激情，欲望，理性是人性中的三个组分。当理性辖制了激情和欲望，人处于一种健康良好的状态，和一个统治者具有高度权威的城邦一样强大。中文翻译很难传达出柏拉图所说的激情和欲望到底指的是什么，感兴趣的朋友可以去看看柏拉图的城邦理论，不过这并不阻碍我们来理解这个著名的等级模型，即激情和欲望应听从于理性。
城邦也有动荡的时候。那么面对扰动，人的情绪到底是应该听从欲望还是理性呢？似乎跟着欲望走，我们并无法得到长期稳定的快乐，而完全服从理性，我们的情绪就像缺乏操练的军队一样沮丧低落，缺乏士气。比如我在学习的时候连看两部电影却感觉到深深的空虚，人们对天衣无缝的春晚吹毛求疵，而偏好无厘头的娱乐节目。这是我遇到的第一个难题：等级理论往往有着华丽的框架，可是却无法解释很多现象。

这些现象具有什么特点呢？
负反馈。生活充斥着各种各样的非线性关系，别说等级理论，就是复杂的社会学模型也无法描摹其千分之一。
于是最初我引入了生态学的概念来思考这个问题。干扰源（disturbance）出现，生态系统会寻求某种新的平衡，这个新平衡点不仅与初始条件和边界条件有关，还受干扰源的随机性影响。如果对情绪高低做一个概率分布，它可能有N个峰值，按照优化理论（optimization），情绪的涨跌取决于你是否会陷入一个局部峰值（local maximum）。在这里我遇到了第二个难题：干扰源的出现有何规律可循么？比如说，我到底是在压力多大情绪多压抑的时候会想看电影？

都所谓“数据”时代了，我们得有些“数据”思维。能不能从生活事件里提取大量的样本，观察下我们的情绪是如何运作的呢？
这个过程非常有趣，我两年前开始使用睡眠监控软件，一年前开始记录每日的情绪和作息。从中我发现了干扰源并不是那么随机，它的反复出现甚至强化与历史记忆有关。之前又看过知乎或果壳讲兴奋抑制，sex和毒瘾都是随着量的增加，冲动/兴奋反而受到抑制。我苦苦思索而不能得的是，干扰源怎么能在有负反馈的情况下反复出现还愈演愈烈！这些理论简直相互矛盾，不make sense!

终于，前段时间读心理控制术的时候，一个想法电闪雷鸣地从天而降。
其实说起来也挺简单的，就是过度负反馈。你也许也意识到生活中的“穿针现象”：差之一厘，手抖得线怎么都穿不进去。球星临门一脚打到门框，任务干到80%之后就是在怀疑中死活干不完最后的20%。每当我们朝着目标进发的时候，负反馈亦如期而至，它们的本意是引领我们避开暗礁走向目标。只是在将要到达目标的时候，负反馈作为一种消极记忆一旦太强，就会中止正确的反馈机制。
好了，你们也许看到有篇讲科研心态的文章了，说这些干扰就是resistance。哈哈哈，生态学和心理学社会学真是一家。生态学里面的disturbance讲的是林火旱涝看上去是外在因素，但未尝不是由内在的resilience产生的。林火对生态系统的打破和重组应该是生态系统演替中不可缺失的一环。

好像很多现象突然间得到了解释：干扰源就是负反馈的产物！！干扰源与进取心是双生子！！有那么一刻吧，我感觉我窥探到了情绪的秘密。

Monday, August 31, 2015

[Python] Tricks list

Since I started learning Python, I didn't maintain any list of "tricks" for revisit and because of that my coding style is kind of outdated. Now I decide any time I see a piece of useful code in the sites such as Stack Overflow, I will add it to the list. If you are also learning Python, you might find quite a few of them you didn't know about and surprisingly useful, like I did.

Just so you know, I build this list based on this nice blog, which inspires me to make a similar list.

1. Define and swap variables

>>> a, b, c = (2 * i + 1 for i in range(3))
>>> a, b, c
(1, 3, 5)

>>> a, b = 1, 2
>>> a, b = b, a
>>> a, b
(2, 1)

2. Slicing and negative indexing

>>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> a[::2]
[0, 2, 4, 6, 8, 10]

a[-1]; a[-4:-2] # last element; last fourth to second element

>>> a
[1, 8, 9, 2, 0, 0, 4, 5]
>>> a[1:-1] = [] # delete some list elements
>>> a
[1, 5]

3. Flip up/down array

a[::-1]

4. Iteration

enumerate: Iterating over list index and value pairs

>>> a = ['Hello', 'world', '!']
>>> for i, x in enumerate(a):
...     print '{}: {}'.format(i, x)  # look at the usage of dictionary
...
0: Hello
1: world
2: !

dict.iteritems: Iterating over dictionary key and values pairs

>>> m = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> for k, v in m.iteritems():
...     print '{}: {}'.format(k, v)
...
a: 1
c: 3
b: 2
d: 4

4. Flatten a list

>>> a = [[1, 2], [3, 4], [5, 6]]
>>> list(itertools.chain.from_iterable(a))
[1, 2, 3, 4, 5, 6]

>>> sum(a, [])
[1, 2, 3, 4, 5, 6]

>>> [x for l in a for x in l]
[1, 2, 3, 4, 5, 6]

>>> a = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
>>> [x for l1 in a for l2 in l1 for x in l2]
[1, 2, 3, 4, 5, 6, 7, 8]

>>> a = [1, 2, [3, 4], [[5, 6], [7, 8]]]
>>> flatten = lambda x: [y for l in x for y in flatten(l)] if type(x) is list else [x]
>>> flatten(a)
[1, 2, 3, 4, 5, 6, 7, 8]

5. Flatten an array

Turn Nx1 NumPy array into 1D array

arr1d = np.ravel(arr2d)    
arr1d = arr2d.ravel()
arr1d = arr2d.flatten()

arr1d = np.reshape(arr2d, -1)
arr1d = arr2d[0, :]

二次啟航

Pages