AP19679015 “Application of new data collection and processing methods based on machine learning algorithms for analysis of cellular proteome”


A new generation of tools, new methods for the analysis of biological samples, and advanced software have made it possible to significantly improve protein identification capabilities. For example, the global HPP initiative has increased protein identification from 13,588 in 2011 to 17,874 in 2020, representing more than 7 orders of magnitude of the dynamic range of human proteomes. These advances are extremely important because biomarkers of various diseases are usually expressed in cells, in small quantities that are sometimes difficult to detect with traditional methods of analysis. In addition, these modern methods of proteomics offer new possibilities for a deeper understanding of the mechanisms of cell functioning. Thus, the expansion of analytical capabilities of the proteome platform «National Center of Biotechnology» will allow to more effectively solve both problems of basic science and applications, including first of all medicine and healthcare.


The project aims to introduce a new working process consisting of combining data-independent acquisition (DIA) and its processing with programs based on machine learning algorithms for in-depth analysis of the sub-cellular distribution of proteins, studying their post-translational modifications and protein-protein interactions.

The goals of the project will be achieved using both traditional data collection methods  such as DDA, MRM, and the new DIA method, as well as various computer programs (Spectronaut, PEAKS, MaxDIA, DIA-NN) to process the results.

The model tasks of the project are detailing the sub-cellular distribution of proteins (nucleus, organelles, and cytoplasm), in-depth study of post-translational modifications (ubiquitination, sumoylation) protein-protein interactions and their logical relationship using examples of DNA repair proteins, nucleus membranes, and pluripotency transcription factors. To solve these problems, traditional and new methods of cell fractionation will be used, as well as new methods of enzymatic in vivo tagging proteins on the example of biotinylation from convergence (Proximity Utilizing Biotinylation, PUB) and proximal biotinylation using TurboID mutant biotin ligase. The measurable indicators of these steps will be the results of qualitative and quantitative analysis of samples obtained after the application of various DDA →Mascot, MRM → Skyline, DIA → ML/DL workflows.

Expected results

  1. The HEK293T and CHO subcellular fractionation protocol will be optimized, compatible with the LC-MS/MS sample preparation protocols.
  2. With the help of the ML/DL program, identification and quantitative analysis of subcellular fractionation proteins of HEK293T and CHO will be obtained.
  3. Genetic engineering constructs BirA-UBA-X, BirA-UIM-X (X=GFP, RAD18, SOX2, OCT4, NANOG) and results on biotinylation of ubiquitinated and sumoylated partner proteins in HEK293T and CHO cells will be obtained.
  4. Results will be obtained on the identification and quantitative analysis of ubiquitinated and sumoylated partner proteins RAD18, POLH, SOX2, OCT4, NANOG using ML/DL programs.
  5. Genetic engineering constructs of TurboID-X (X=EMD, NRM, SOX2, OCT4, NANOG, RAD18, POLH) will be created and developed and results will be obtained on the biotinylation of their partner proteins in HEK293T and CHO cells.
  6. The identification and quantitative analysis results of partner proteins will be obtained on EMD, NRM, SOX2, OCT4, NANOG, RAD18, and POLH using ML/DL programs.

Project Leader

Kulyyasov Arman Tabylovich, project leader – Cand. Sc. Chemistry, Leading Researcher of the Laboratory of Proteomics and Mass Spectrometry of the NCB of RK. The Hirsch index (h-index) is – 10 (Scopus). Author ID in Scopus https://www.scopus.com/authid/detail.uri?authorId=7004152301; Researcher ID Web of Science https://www.webofscience.com/wos/author/record/K-9148-2017; ORCID ID https://orcid.org/0000-0002-7932-5689

Members of the research team

Akhmetollayev Ilyas Amirkhanovich (h-index – 1), Cand. Sc. Biology, Leading Researcher, Head of Organic Synthesis Laboratory of LLP «National Center of Biotechnology», proficient in molecular biology and chemical synthesis of oligonucleotides and their modification. Experience in scientific work is 20 years. Author of 5 articles. Completed his internship in the synthesis of oligonucleotides in Russia. The role in the project is the chemical synthesis of oligonucleotides and their modifications.

Manat Esbol (h-index – 1), Ph.D., researcher of the Stem Cell Laboratory of LLP «National Center of Biotechnology», Graduate of the Biological Faculty of KazNU. Al-Farabi (2006-2010), biotechnology major. Master’s degree in natural sciences of the Eurasian National University named after. L.N. Gumileva (2010-2012). Internship and doctoral studies at the University of Lund (Sweden) under the supervision of Dr. Jessica Abbott (2016-2021). He defended his PhD thesis in 2021. He has experience in bioinformatics, particularly R programming language and statistical data processing.

Makhsatova Saya, Bachelor of Biochemical Engineering, University of Debrecen (Hungary), 2019-2023.

Publications and security papers of the Chief Scientist and members of the Study Group on the project theme

  1. Shoaib M., Kulyyassov A., Robin C., Winczura K., Tarlykov P., Despas E., Kannouche P., Ramanculov E., Lipinski M., Ogryzko V. PUB-NChIP – “in vivo biotinylation” approach to study chromatin in proximity to a protein of interest // Genome Research.-2013.-Vol.23, №2.-P.331-340. doi:10.1101/gr.134874.111 (Q1)
  2. Mukanov, K.K., Adish, Z.B., Mukantayev, K.N., Tursunov, K.A., Kairova, Z.K., Kaukabayeva, G.K., Kulyyassov, A.T. and Tarlykov, P.V. Recombinant expression and purification of adenocarcinoma GPR161 receptor.//Asia Pac. J. Mol. Biol. Biotechnol.- 2019.-Vol. 27 (4) .-P. 85-95. DOI: 10.35118/apjmbb.2019.027.4.10 (Q4)
  3. Kulyyassov A., Ogryzko V. In Vivo Quantitative Estimation of DNA-Dependent Interaction of Sox2 and Oct4 Using BirA-Catalyzed Site-Specific Biotinylation // Biomolecules. ‒ 2020. ‒ Vol. 10, № 1. DOI: 10.3390/biom10010142 (Q1)
  4. Adish Zh., Mukantayev K., Tursunov K., Ingirbay B., Kanayev D., Kulyyassov A., Tarlykov P., Mukanov K., Ramankulov Y. Recombinant Expression and Purification of Extracellular Domain of the Programmed Cell Death Protein Receptor // Reports of Biochemistry and Molecular Biology. ‒ 2020. ‒ Vol.8, №4. ‒ P.347-357. http://rbmb.net/article-1-391-en.html (Q3)
  5. Kulyyassov A., Fresnais M., Longuespee R. Targeted liquid chromatography-tandem mass spectrometry analysis of proteins: Basic principles, applications, and perspectives // Proteomics. ‒ 2021. ‒ Vol. 21, № 23-24, e2100153. – P.1-20. – doi: 10.1002/pmic.202100153, PMID: 34591362, IF3.984, Q2 (2020, WoS), percentile 73 on Biochemistry, CiteScore 6.3 (2020, Scopus);
  6. Kulyyassov A. Application of Skyline for Analysis of Protein–Protein Interactions In Vivo // Molecules. ‒ 2021. ‒ Vol. 26, № 23, 7170, – P.1-11. – doi: 10.3390/molecules26237170, PMID: 34885753, IF4.412, Q2 (2020, WoS), percentile 74 on Chemistry, CiteScore 4.7 (2020, Scopus);
  7. Kulyyassov A., Ramankulov Y., Ogryzko V. Generation of Peptides for Highly Efficient Proximity Utilizing Site-Specific Biotinylation in Cells // Life (Basel). ‒ 2022. ‒ Vol. 12, № 2, 300. ‒ P.1-14. – doi: 10.3390/life12020300, PMID: 35207587, IF3.817, Q2 (2020, WoS), percentile 50 on General Biochemistry, Genetics and Molecular Biology, CiteScore 2.6 (2020, Scopus);
  8. Kanayev D., Abilmazhenova D., Akhmetollayev I., Sekenova A., Ogay V., Kulyyassov A. Detection of Recombinant Proteins SOX2 and OCT4 Interacting in HEK293T Cells Using Real-Time Quantitative PCR // Life (Basel). ‒ 2023. ‒ T. 13, № 1.-P1-10.-https://doi.org/10.3390/life13010107, PMID: 36676054, IF3.817, Q2 (2020, WoS), percentile 50 on General Biochemistry, Genetics and Molecular Biology, CiteScore 2.6 (2020, Scopus);
  9. Кулыясов А.Т., Раманкулов Е.М., Огрызько В.В. Евразийский патент. Номер 034880. Номер заявки №201500724. Дата подачи 28.05.2015. Дата публикации и выдачи 01.04.2020. Применение биотин-лигазы BirA и пептидов-акцепторов биотина BAP1070 и BAP1108 для детектирования белок-белковых взаимодействий in vivo с использованием пар Bira/BAP1070 и Bira/BAP1108. https://www.eapo.org/ru/patents/reestr/patent.php?id=34880
  10. Кулыясов А.Т., Огрызько В.В. Рекомбинантная плазмида pcDNA3.1(+)-BAP-HP1a, кодирующая гетерохроматиновый белок человека HP1a и обеспечивающая его экспрессию в клетках НЕК293Т. Патент Республики Казахстан на изобретение, регистрационный номер 2013/1352.1. Номер 30034, бюллетень №6, http://kazpatent.kz/images/bulleten/2015/gazette/pdf/2-201506.pdf 
  11. Кулыясов А.Т., Огрызько В.В. Рекомбинантная плазмида pcDNA3.1(+)-BAP-KAP1, кодирующая белок транскрипционного кофактора человека KAP1 и обеспечивающая его экспрессию в клетках НЕК293Т. Патент Республики Казахстан на изобретение, регистрационный номер 2013/1377.1. Номер 30035, бюллетень №6, http://kazpatent.kz/images/bulleten/2015/gazette/pdf/2-201506.pdf
  12. Kulyyassov A., Shoaib M., Pichugin A., Kannouche P., Ramanculov E., Lipinski M., Ogryzko V. PUB-MS: A Mass Spectrometry-based Method to Monitor Protein-Protein Proximity in vivo //J. Proteome Res.-2011 .-Vol.10, No.10.-P.4416-4427. doi: 10.1021/pr200189p.
  13. Kulyyassov A., Shoaib M., Ogryzko V. Use of in vivo biotinylation for chromatin immunoprecipitation // Curr. Protoc. Cell Biol.-2011.- Chapter 17, Unit17.12. doi: 10.1002/0471143030.cb1712s51.
  14. Kulyyassov A.T., Ramanculov E.M., Ogryzko V.V. In vivo Biotinylation Based Method for the Study of Protein-Protein Proximity in Eukaryotic Cells// Cent Asian J Glob Health. – 2014 Cent Asian J Glob Health. – 2014 Jan 24;2(Suppl):96. doi: 10.5195/cajgh.2013.96. eCollection 2013.
  15. Kulyyassov A., Zhubanova G, Ramanculov E, Ogryzko V. Proximity Utilizing Biotinylation of Nuclear Proteins in vivo. // Cent Asian J Glob Health. – 2015 Jun 15;3(Suppl):165. doi: 10.5195/cajgh.2014.165. eCollection 2014.
  16. Issabekova A.S., Zhunusova M.S., Ramanculov E.M., Kulyyassov A.T. Comparison and optimization of transient transfection methods at HEK293T cell line. // Eurasian journal of applied biotechnology.-2017.- No.1.-P.38-43. DOI: 10.11134/btp.1.2017.5
  17. Kulyyassov A.T., Ramanculov E.M., Ogryzko V.V. Cloning and expression of recombinant protein of SUMO, fused with biotin acceptor peptide // Eurasian journal of applied biotechnology. – 2018. -№1. – P.37-41. DOI: 10.11134/btp.1.2018.6
  18. Kulyyassov A.T., Ramanculov E.M. Applications of the Impact II high resolution quadrupole time-of-flight (QTOF) instrument for shotgun proteomics // Eurasian journal of applied biotechnology. – 2018. -№3. – P.3-18. DOI: 10.11134/btp.3.2018.1
  19. Yessimseitova A.K., Shustov A.V., Ahmetollaev I.A., Krassavin V.F., Kakimzhanova A.A. Molecular-genetic certification of potato varieties and forms using SSR-markers // Eurasian Journal of Applied Biotechnology, 2015, №2, pp. 4-16. DOI: 10.11134/btp.2.2015.6.

Results achieved


Within the framework of the first phase of the calendar plan, optimization of the protocol of subcellular fractionation of HEK293T and CHO cells was carried out.

Chemical fractionation methods, as opposed to centrifugation or ultracentrifugation methods, require less effort and can quickly produce a minimum number of fractions. However, their disadvantage is the employment of surfactants such as Triton X-100 or sodium dodecyl sulfate (SDS), which are incompatible with LC-MS/MS protocols. More recently, Japanese researchers have proposed a new method of subcellular fractionation using compatible reagents such as sodium deoxycholate (SDC) and sodium lauryl sarcosinate (SLS) for subsequent proteomic analysis. In this project, we used this technique for the sub-cellular fractionation of HEK293T and CHO cells, compared to the traditional method of cell lysing using 0.5% Triton-X100.

We have optimized the method of spectrophotometric determination of the residual amount of SDC in small quantities of samples intended for analysis utilizing liquid chromatography of tandem mass spectrometry LC-MS/MS. During the study, we have found limitations, related to the collapse of the linearity of the gauge curve to quantify residual detergents in the case of low pH and the presence of certain ions in buffer solutions such as Cl, NO3, especially at higher salt concentrations. The presence of chloride, nitrate, and acid ions contributes to the formation of methylene blue aggregates from the aqueous phase to the chloroform phase, which also absorbs at 655 nm and produces a competitive effect. Using an optimized protocol, we demonstrated that single extraction with ethyl acetate reduces the amount of SDC by a factor of 20 (from 0.5% to 0.025%). We also discovered the possibility of using polystyrene 96-hole tablets to measure chloroform samples, but within a short period of time. This allows high-performance experiments with fewer reagents and solvents and more measurements for statistical analysis.

As part of the second phase of the calendar, the obtained samples of the sub-cellular fractions of HEK293T and CHO cells are being analyzed in the DDA mode of liquid chromatography of tandem mass spectrometry.

Publications and security documents


One article of the domestic publication recommended by COASSHE was adopted for publication.

Makhsatova Saya, Kulyyassov Arman. Quantification of sodium deoxycholate in proteomics sample preparation using methylene blue // Eurasian journal of applied biotechnology. – 2023.