Notes

  • Writing Problem Statement

    Using Problem Framing as a guide, students can now construct Problem Statement & Justification in the Research Background section of the Research Proposal Template.

    1. Problem Statement & Justification

    The research problem is the foundation of your entire proposal. It clearly defines the issue or gap in knowledge that your study aims to address. Use your Problem Framing as reference to write a strong problem statement and justification:

    Paragraph 1 – Provide Context: Start by pinpointing a specific real-world issue, challenge, or question within your chosen field. 

    Paragraph 2 – Highlight the Gap:   Show where current understanding falls short or where there are conflicting viewpoints. Provide limitations of current methods with proofs.

    Paragraph 3 – The Justification: Demonstrate why this issue is important and why existing knowledge is insufficient. Justify the current approach, including why the application Machine Learning is needed.

    2. Research Questions

      Research questions are the cornerstone of your research project. They act as a roadmap, guiding your investigation and focusing your analysis. Here’s how to craft effective research questions for your proposal:

      a. Main Research Question

        Your main research question should stem directly from the problem statement you identified and derived from your problem framing. It should be specific and directly address the gaps in knowledge you highlighted.

        b. Sub-questions 

          Provide 2-3 smaller questions that help you answer the main question. Frame your questions in a concise and clear way. Avoid ambiguity and ensure they can be answered through your chosen research methods and data you plan to collect.

          Make sure your questions are measurable and feasible within the scope of your project.

          By following these guidelines, you can develop strong research questions that will guide your research and ensure a well-focused proposal. Remember, your research questions should spark curiosity, provide direction, and ultimately lead to valuable insights!

          3. Research Objective and Hypothesis

          a. Main Objective

            The main objective is the goal of your research project. It’s a broad statement that captures the general direction of your investigation and answerable to the Main Research Question.

            b. Specific Objectives

              Specific objectives break down the main objective into smaller, more manageable steps and answerable to Sub-questions. They outline the concrete actions you’ll take to achieve your overall goal. Specific objectives should be SMART (Specific, Measurable, Achievable, Relevant, and Time-bound).

              c. Hypothesis 

                A hypothesis is a specific prediction about the outcome of your research. It’s an educated guess based on existing knowledge or theories. Not all research projects require a hypothesis, particularly exploratory studies.

              1. Problem Framing

                Problem framing is a critical thinking process before starting an ML project (and any other research projects). Basically, researchers spend about 70-80% of entire research time to define and fine tune the direction of their researches. Imagine like preparing a blueprint of a building that requires information and modification until the foundation is solid. The design of the building can be expanded and customized based on new inputs later.

                But the main point of doing problem framing in ML is to decide whether you even need ML in your project.

                Follow the step-by-step guide for conducting problem framing.

                1. Context
                  Provide context from the real-world situation.
                  Explain what is actually happening and why this problem matter.
                2. Question
                  Convert context in a simple single answerable question.
                  e.g. Can fabric appearance estimate exposure time to local microclimate?

                ML Justification Check – Determine whether ML is necessary or if a simpler method can solve the problem.

                1. Define input-output mapping.
                  Specify what data goes in (input) and what the model should predict (output).
                  X(Features): Image of larval head capsules.
                  Y(Label): Larval age/instar
                2. Select ML task type
                  Determine the appropriate ML (classification, regression, clustering).
                3. Identify data representation
                  Decide how data is structured and encoded.
                  e.g. image of larval cephalopharyngeal skeleton labelled according to species.
                4. Define success metrics
                  Establish how model performance will be measured and its meaning in biology.

                  Classification > accuracy, F1
                  Regression > RMSE, MAE
                  Clustering > Silhouette

                  For undergraduate project proposal, indicate success criteria based on input and output e.g. The model successfully classify cephalopharyngeal skeleton according to species.
                5. Identify assumptions, limitations and biases
                  Explicitly state any underlying assumptions, limitations and biases in the study.
                  E.g. Sampling bias, unknown environmental data.
                6. Check feasibility
                  Assess if the project solvable within time and provided resources.
                7. Iterate
                  Based on 7 and 8, revisit 3, 4, 5 and 6.


                Practice Problem Framing using this template.

              2. Research Framework in Machine Learning

                This framework presents machine learning research as an iterative scientific process rather than a one-way technical pipeline. Each step contributes to how a research problem is translated into data, models, and eventually meaningful insight. Researchers are advised to use this framework as basis to research methodology.

                1. Problem Framing

                Problem framing is the starting point of the research. At this stage, the researcher defines the core question that the study aims to address and translates it into a machine learning problem. This includes identifying the prediction target, determining whether the task involves classification, regression, or both, and clarifying the real-world purpose of the study. A well-framed problem ensures that the later stages of data handling, modeling, and evaluation remain aligned with the actual research objective.

                2. Data Collection

                Once the problem is defined, relevant data must be collected. This step involves gathering the observations, measurements, and images, that will be used in the study. The quality and representativeness of the dataset are critical, because machine learning models learn patterns directly from the data provided. Poorly collected or biased data will affect every later step, regardless of how sophisticated the model is.

                3. Data Pre-Processing

                Data pre-processing prepares the dataset for analysis by cleaning errors, handling missing values, normalizing or resizing inputs, organizing labels, and transforming the data into a suitable format. In image-based studies, for example, this may include cropping, scaling, grayscale conversion, contrast adjustment, or background removal. This step improves consistency and helps reduce unwanted variation that may interfere with learning.

                4. Feature Engineering

                Feature engineering refers to the process of extracting, selecting, or constructing informative representations from the pre-processed data. In conventional machine learning, this may involve manually defining measurements such as length, area, texture, or shape descriptors. In deeper learning workflows, it can also involve choosing image representations or latent features that better reveal meaningful structure. This step is important because the way data are represented can strongly influence how well a model can learn useful patterns.

                5. Model Training

                Model training is the stage where the algorithm learns from the training data. During this process, the model adjusts its internal parameters so that it can map input data to the intended output. The training step depends on the problem type, the chosen algorithm, and the structure of the data. At this stage, the goal is not only to fit the data but to learn patterns that can generalize beyond the examples already seen.

                6. Model Testing (Validation)

                After training, the model must be tested on data that were not used during learning. This helps determine whether the model can generalize to new cases rather than simply memorizing the training set. Validation provides an intermediate assessment of model behavior and allows the researcher to monitor overfitting, compare alternative configurations, and decide whether further refinement is needed. In research practice, this step acts as an important checkpoint before final evaluation.

                7. Model Evaluation (Performance Analysis)

                Model evaluation examines how well the trained model performs using appropriate metrics and analytical criteria. The specific evaluation approach depends on the research objective. Classification studies may use accuracy, precision, recall, or F1-score, while clustering studies may use silhouette score or Davies–Bouldin index, and regression studies may use RMSE or MAE. In research settings, this stage may also include robustness checks, statistical comparison, and other analyses to determine whether the observed performance is stable, meaningful, and scientifically defensible.

                8. Insight and Interpretation

                Machine learning research should not end with numerical performance alone. The final step is to interpret what the results actually mean in relation to the research question. This may involve identifying why one representation performs better than another, understanding which features contribute most to the outcome, or explaining how the findings relate back to the scientific or applied problem. Insight and interpretation transform model output into research knowledge.

                The dashed arrows in the diagram represent iteration, which is one of the defining characteristics of machine learning research. Results from evaluation and interpretation often lead the researcher back to earlier stages such as pre-processing, feature engineering, model design, or even problem framing itself. This means the workflow is not strictly linear. Instead, it is a cycle of refinement in which each round of analysis improves understanding of the data, the model, and the research question.

              3. Guide: 2D Landmark Data Acquisition

                Step 1: Create a TPS File with tpsUtil

                1. Select Operation: Choose “Build tps file from images”.
                2. Input Directory: Browse and select the folder containing specimen images.
                3. Output File: Name new .tps file and choose where to save it.
                4. Create: Click “Setup” to verify the images are listed, then click “Create.” This file initializes all landmarks to zero for each image.

                Step 2: Set Up tpsDig for Digitization

                Open tpsDig to visually place landmarks on your specimens:

                1. Input Source: Go to File > Input Source > File and select the .tps file you just created.
                2. Set Scale (Optional): If your images include a ruler, use the “Set Scale” tool to drag a line across a known distance (e.g., 1cm) to calibrate measurements. If not, it defaults to 1mm.
                3. Define Landmarks: It is critical to have a set of homologous points identified beforehand (e.g., tip of the dorsal cornua in cephalopharyngeal skeleton, costagial break in Phoridae wing) to ensure consistency across all specimens.

                Step 3: Digitizing Landmarks

                1. Activate Tool: Click the “Digitize Landmarks” icon (the crosshair/target icon).
                2. Place Points: Click on the specimen in the exact order specified by your protocol⚠️. The software will automatically number the points.
                3. Edit/Correct: If you misplace a point, switch to “Edit Mode” (arrow icon) to drag and reposition the landmark.
                4. Navigate specimens: Use the “Next” arrow to move to the next image and repeat the process.

                Step 4: Save and Verify Data

                1. Save Data: Go to File > Save Data to update the .tps file with the X and Y coordinates of your landmarks.
                2. Verify: You can open the .tps file in a text editor to see that the LM (landmark) value now correctly reflects the number of points captured for each specimen.

                Link to Video: Landmarks acquisition in 2D using tpsDig

                Link to Software: tpsUtil, tpsDig

                Source: Morpho Prof YouTube Channel, SB Morphometrics

              4. FYP Discussion 2026/2

                TL;DW

                • Start living in the Machine Learning environment. Learn new terminologies and keywords.
                • Get enrolled in Google Developers foundational course: Introduction to Machine Learning and Problem Framing.
                • Explore different research approaches by prioritizing the quality of data.

                Link to video discussion 🎥

              5. FYP26 Weekly Worksheet

                Week 1

                Date: 9 – 13 Mar 2026

                Intro FYP Research

                • Title discussion
                • RP format
                • Research ecosystem
                (more…)