@gekeshi
2016-12-19T09:05:43.000000Z
字数 5683
阅读 438
Cryo-EM
论文旨在利用CNN在冷冻电镜mrc图像中挑选出合格的粒子[1]。
整体来看,神经网络在这个方法中用来对mrc图像的各个小方格进行评分,得到score map,随后运用几个过滤的方法进行颗粒挑选
The -secretase and spliceosome datasets were obtained from Dr. Yigong Shi’s lab at Tsinghua University
-secretase:
Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 4UIS), and the EM maps have been deposited in the Electron Microscopy Data Bank, www.ebi.ac.uk/pdbe/emdb (accession no. EMD-2974)
TRPV1[4]
The TRPV1 dataset was downloaded from EMPIAR (entry ID: EMPIAR-10005)
-galactosidase[5]
Data deposition: Density map derived by cryo-electron microscopy and the fitted atomic model have been deposited in the EMDataBank (EMDB), www.emdatabank.org (EMDB IDcodeEMD-5995), and the Protein Data Bank (PDB), www.pdb.org (PDB ID code3J7H), respectively.
N-ethylmaleimide sensitive factor complex[6]
Public dataset[7]
一些公开的MRC数据,有标注
the performance of our fully automated particle picking method was relatively robust at different defocus levels
1. The initial training data are obtained from the known particles of other molecular complexes whose structures have been previously solved via cryo-EM.
2. Training a pre-trained model.
3. First particle picking iteration.
4. Training model using the packing result from step 3.
5. Second particle pickng iteration.
scoring
a sliding window (i.e., a square box) of a fixed size is used to scan each micrograph from the top left corner to the bottom right corner with a constant step size. The box size of the sliding window is chosen to be slightly larger than the particle size, which can be easily estimated and defined as a parameter.
The prediction score between 0 and 1 output by the CNN model, which represents the probability of being a particle at the current
position, is then assigned to the center of the corresponding window.
score map是什么样的?
cleaning
As ice noise can easily introduce false positives during the picking process, we also employ a cleaning step to discard these false particles from the candidate list.
we first connect any two neighboring pixels if their prediction scores are both above a threshold, and then examine the size of each connected domain (i.e., the portion of all connected pixels).
If the size of a connected domain is larger than a cutoff value, it is regarded as a potential false positive probably due to ice noise.
不明白 connected domain 的工作过程
filtering
we aim to refine the current set of particle candidates and also identify the center coordinates of the final remaining particles from the scored map. We first introduce a concept of peak window,
the size of which is related to the minimum distance between centers of two possible particles.
Then the position with the maximum prediction score in each peak window is chosen and output as the center of a particle. We also remove bad particle candidates in which the number of extreme
pixels is more than three standard deviations away from the mean.
peak window 具体怎么样挑选
sorting
we sort the remaining particle candidates according to their prediction scores.
iteration
we use the particles picked by the previous CNN classifier which was trained over the known particles of other molecules to further refine the CNN model. After a certain number of iterations, the
algorithm outputs the top list of the highest-rated particles.
利用初步挑选的目标粒子进一步训练评分网络,之后再做一次颗粒挑选
手动挑选小部分目标粒子用于训练网络。半自动的挑选用于和全自动挑选做对比。
An alternative training scheme is to let the user manually select a small number of particles as positive samples to train the CNN model and initialize the particle selection process
Thus, an additional effective method for evaluating the practicability of an automated particle picking approach is to further examine the 2D clustering and class averaging results of the identified particles.
既然2D classification 可以 filte false positive particle ,那能不能一起和autopick做? (github项目中已经实现)
北大的这篇论文[8]手动挑选一些训练数据,训练粒子图片的评分网络,test中得到scaning window的评分,根据阈值筛选,然后根据标准差过滤false positive。