@gekeshi 2016-12-19T09:05:43.000000Z 字数 5683 阅读 438

Note on DeepPicker

Cryo-EM

Note on DeepPicker

Abstract

论文旨在利用CNN在冷冻电镜mrc图像中挑选出合格的粒子[1]。
整体来看，神经网络在这个方法中用来对mrc图像的各个小方格进行评分，得到score map，随后运用几个过滤的方法进行颗粒挑选

problem & opportunities：

用测试的mrc初步挑选的结果再次训练的策略是否是作弊？和refine是什么关系?
能否通过物体监测的方法直接得到候选的颗粒，进一步，过滤的操作也在神经网络中完成？
包含噪声的灰度图？

Dataset

$\gamma$ -secretase[2] $\&$ spliceosome[3]

The $\gamma$ -secretase and spliceosome datasets were obtained from Dr. Yigong Shi’s lab at Tsinghua University

$\gamma$ -secretase:
Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 4UIS), and the EM maps have been deposited in the Electron Microscopy Data Bank, www.ebi.ac.uk/pdbe/emdb (accession no. EMD-2974)
TRPV1[4]

The TRPV1 dataset was downloaded from EMPIAR (entry ID: EMPIAR-10005)
$\beta$ -galactosidase[5]

Data deposition: Density map derived by cryo-electron microscopy and the fitted atomic model have been deposited in the EMDataBank (EMDB), www.emdatabank.org (EMDB IDcodeEMD-5995), and the Protein Data Bank (PDB), www.pdb.org (PDB ID code3J7H), respectively.
N-ethylmaleimide sensitive factor complex[6]
Public dataset[7]
一些公开的MRC数据，有标注

Data Pre-process

The defocus value of each micrograph was calculated using
CTFFIND4 (Rohou and Grigorieff, 2015).

the performance of our fully automated particle picking method was relatively robust at different defocus levels
For each micrograph, we first used a Gaussian filter as a low pass filter to remove white noise with high frequency components.
Then the binning strategy (Li et al., 2013) was used to convert each original micrograph to an image ranging between 1000 and 2000 pixels.
In addition, all the coordinates of the reference particles were further aligned using FREALIGN (Grigorieff, 2007).

Method

Training&picking
1. The initial training data are obtained from the known particles of other molecular complexes whose structures have been previously solved via cryo-EM.
2. Training a pre-trained model.
3. First particle picking iteration.
4. Training model using the packing result from step 3.
5. Second particle pickng iteration.

particle picking

scoring

a sliding window (i.e., a square box) of a fixed size is used to scan each micrograph from the top left corner to the bottom right corner with a constant step size. The box size of the sliding window is chosen to be slightly larger than the particle size, which can be easily estimated and defined as a parameter.
The prediction score between 0 and 1 output by the CNN model, which represents the probability of being a particle at the current
position, is then assigned to the center of the corresponding window.

score map是什么样的?
cleaning

As ice noise can easily introduce false positives during the picking process, we also employ a cleaning step to discard these false particles from the candidate list.
we first connect any two neighboring pixels if their prediction scores are both above a threshold, and then examine the size of each connected domain (i.e., the portion of all connected pixels).
If the size of a connected domain is larger than a cutoff value, it is regarded as a potential false positive probably due to ice noise.

不明白 connected domain 的工作过程
filtering

we aim to refine the current set of particle candidates and also identify the center coordinates of the final remaining particles from the scored map. We first introduce a concept of peak window,
the size of which is related to the minimum distance between centers of two possible particles.
Then the position with the maximum prediction score in each peak window is chosen and output as the center of a particle. We also remove bad particle candidates in which the number of extreme
pixels is more than three standard deviations away from the mean.

peak window 具体怎么样挑选
sorting

we sort the remaining particle candidates according to their prediction scores.
iteration

we use the particles picked by the previous CNN classifier which was trained over the known particles of other molecules to further refine the CNN model. After a certain number of iterations, the
algorithm outputs the top list of the highest-rated particles.

利用初步挑选的目标粒子进一步训练评分网络，之后再做一次颗粒挑选

semi-automated particle picking

手动挑选小部分目标粒子用于训练网络。半自动的挑选用于和全自动挑选做对比。

An alternative training scheme is to let the user manually select a small number of particles as positive samples to train the CNN model and initialize the particle selection process

evaluation

Thus, an additional effective method for evaluating the practicability of an automated particle picking approach is to further examine the 2D clustering and class averaging results of the identified particles.

既然2D classification 可以 filte false positive particle ，那能不能一起和autopick做？（github项目中已经实现）

another paper

北大的这篇论文[8]手动挑选一些训练数据，训练粒子图片的评分网络，test中得到scaning window的评分，根据阈值筛选，然后根据标准差过滤false positive。

[1] Wang, Feng, Huichao Gong, Gaochao Liu, Meijing Li, Chuangye Yan, Tian Xia, Xueming Li, and Jianyang Zeng. 2016. “DeepPicker: A Deep Learning Approach for Fully Automated Particle Picking in Cryo-EM.” Journal of Structural Biology 195 (3): 325–36. ↩
[2] Sun, Linfeng, Lingyun Zhao, Guanghui Yang, Chuangye Yan, Rui Zhou, Xiaoyuan Zhou, Tian Xie, et al. 2015. “Structural Basis of Human

$\gamma$ -Secretase Assembly.” Proceedings of the National Academy of Sciences 112 (19): 6003–8. doi:10.1073/pnas.1506242112. ↩
[3] Chuangye Yan, Jing Hang, RuixueWan, Min Huang, Catherine C. L.Wong, and Yigong Shi. Structure of a yeast spliceosome at 3.6-angstrom resolution. Science, 349(6253):1182–1191, 2015. ↩
[4] Liao, Maofu, Erhu Cao, David Julius, and Yifan Cheng. 2013. “Structure of the TRPV1 Ion Channel Determined by Electron Cryo-Microscopy.” Nature 504 (7478): 107–12. doi:10.1038/nature12822. ↩
[5] Bartesaghi, Alberto, Doreen Matthies, Soojay Banerjee, Alan Merk, and Sriram Subramaniam. 2014. “Structure of

$\beta$ -Galactosidase at 3.2-Å Resolution Obtained by Cryo-Electron Microscopy.” Proceedings of the National Academy of Sciences 111 (32): 11709–14. doi:10.1073/pnas.1402809111. ↩
[6] Minglei Zhao, Shenping Wu, Qiangjun Zhou, Sandro Vivona, Daniel J Cipriano, Yifan Cheng, and Axel T Brunger. Mechanistic insights into the recycling machine of the SNARE complex. Nature, 518(7537):61ł67, 2015. ↩
[7] http://emg.nysbc.org/redmine/projects/public-datasets/wiki/Public_Datasets ↩
[8] Zhu, Yanan, Qi Ouyang, and Youdong Mao. 2016. “A Deep Learning Approach to Single-Particle Recognition in Cryo-Electron Microscopy.” arXiv:1605.05543 [Physics], May. http://arxiv.org/abs/1605.05543. ↩