Page 67 - ISMCON souvenir 2021
P. 67

ISMSCON - 2021

             “Research” - a way of systematically searching and studying materials and sources to establish facts and
             reach new conclusions, is not complete without data collection, statistical analysis and interpretation of the
             collected data; this includes steps to organize, summarize and communicate the collected information in
             a meaningful way. This is a vital skill needed for researchers and professionals from many domains like:
             Economics, Machine learning, Data Mining, Health-care, etc. Biostatistics, an applied branch of Statistics,
             is the Science that helps: (i) in managing medical uncertainties; (ii) researchers to decide on treatments
             or to identify factors contributing to diseases. This is extensively used in Epidemiology, which is the basic
             science of Public Health that uses Statistics and Research Methodologies to arrive at conclusions about
             diseases within certain population and identifies the causes or risks of certain diseases. Biostatistics
             also helps in designing the clinical trials and draw conclusions. It is important for the investigator and
             the interpreting clinician to understand the basics of Biostatistics for two reasons: (1) to choose the right
             statistical test based on the nature of data (2) to understand if an analysis is carried out rightfully. The
             Biostatistical Analysis serves as a key to conduct new Clinical Research; hence, it has become one
             of the foundations of evidence-based clinical practice. Biostatistics involves a complete understanding
             of: (a) Variable types (b) Distribution of data (c) Hypothesis testing (d) Statistical tests (e) Measures of
             association (f) Regression analysis and (g) Diagnostic tests. The work area of Biostatisticians focuses
             on Epidemiology,  Clinical Trials, Systematic  Review  and Meta-Analysis, Observational and Complex
             Interventional  Studies, Population  Genetics, Statistical  Genetics and  Systems Biology, wherein,  they
             involve  Designing,  Conducting, Analyzing,  Calculating  Sample Size, Measuring  Random Errors and
             Interpreting the Statistical Significance of the results. Biostatistics serves as a boon to medical research
             by preventing  frauds in  clinical  trials; investing  proposed  medical  treatments; assessing  the relative
             benefits of competing therapies; establishing optimal treatment combinations; reducing misclassifications;
             improving knowledge of diseases and helping in identifying new treatments and medical devices.
             Keywords : Statistical analysis, Biostatistics, Epidemiology, Public health, Clinical trials.



              OS33:  APPLICATIONS  OF  MACHINE  LEARNING  MODELS
              USING SEER DATABASE: A REVIEW

                                                Kiruthika G , Vasna Joshua   b
                                                            a
                                                           Affiliation:
                                            a PhD Scholar, Madras University, Chennai.
                                   b Scientist-C, ICMR-National Institute of Epidemiology, Chennai.
                                                   kiruthikabiostat@gmail.com
             Keywords: machine learning, SEER program, review, cancer,
             The Surveillance, Epidemiology, and End Results (SEER) Program is an authoritative source for cancer
             statistics in the United States. It aims at reducing the burden of cancer on the U.S. population. There
             are a number of SEER registries starting from the year 1975. The latest registry contains information
             about 11,865,152 cancer cases from 2000 to 2018. There is information about the demographic profile,
             behavior  of the patient, cancer stage, cancer  type, cause  of death, insurance  details  and  other site
             specific details for the cases. Such a huge cancer dataset can be utilized for to solve various public health
             problems related to cancer.
             Considering the vastness of the dataset, machine learning is an effective tool that can be used. Supervised
             machine learning models can be used for prediction purposes whereas unsupervised models could be used
             to identify unknown patterns in data. More specifically, unsupervised models are used for feature selection
             which means selecting the most important features in the data that will work as predictors in prediction
             modelling. Feature engineering involves tasks such as feature transformations and aggregations which
             are essential to handle problems like multicollinearity and to improve the performance of the models.
             When the data volume is huge in terms of both number of records and features, feature engineering gets
             complex. Machine learning techniques are used for feature engineering to overcome such complexity.


             CONFERENCE SOUVENIR                                                                               65
   62   63   64   65   66   67   68   69   70   71   72