May 12, 2025

5.6 Web Apps Using AI Models

0

Today’s lesson, 5.6 Web Apps Using AI Models, dives into a more advanced way of building and using AI, specifically for teams comfortable with Python programming and Jupyter Notebooks.

Important Note: This lesson is significantly more technical than previous ones. If your team is primarily using block-based coding like Thunkable or App Inventor and simpler AI tools like Teachable Machine, this Python-based approach might be too complex for the ICT Club timeline. However, if your team does have Python experience and wants to build custom AI models with code, this lesson offers a powerful pathway!

Part 1: The Python/Streamlit AI Workflow

This lesson focuses on using Python libraries, especially scikit-learn, within an environment like Jupyter Notebooks to train your own machine learning models from scratch using data (often from spreadsheets or online sources). Then, it shows how to deploy that trained model into an interactive web application using another Python library called Streamlit.

The basic Machine Learning workflow remains:

  1. Dataset: Get your data (text, numbers, images, etc.).
  2. Find Patterns (Train): Use Python code to clean the data and train an AI model.
  3. Make Prediction: Use the trained model within a web app (Streamlit) to make predictions on new input.

Section 2: Preparing Your Data (Preprocessing)

Raw data, like datasets downloaded from Kaggle or collected via surveys, is often messy. Before you can feed it to an AI algorithm, you need to clean and format it. This is called Preprocessing, and it’s often the most time-consuming part! In Python (using libraries like Pandas), this involves steps like:

  • Handling missing values (deciding whether to remove rows/columns or fill in gaps).
  • Converting text categories (like “Yes”/”No” or “Male”/”Female”) into numbers that algorithms can understand (e.g., 0/1).
  • Scaling numerical data so that features with large values don’t unfairly dominate the model.
  • Selecting the most relevant features (columns) to use for training.

Section 3: Splitting the Data (Train vs. Test)

Just like with simpler platforms, you MUST split your preprocessed data into two parts:

  • Training Set (e.g., 75-80%): Used to teach the AI model.
  • Testing Set (e.g., 20-25%): Used only after training to evaluate how well the model performs on data it has never seen before.

Python’s scikit-learn library has functions (like train_test_split) that make this splitting process easy.

Section 4: Choosing Your Algorithm (The Model’s Brain)

Python and scikit-learn offer many different algorithms to build your model. A key first step is deciding if you need:

  • Classification: To predict distinct categories or classes.
    • Examples: Is an email Spam/Not Spam? Does this patient have High/Medium/Low stroke risk? Is this plant Healthy/Diseased Type A/Diseased Type B?
    • Common Algorithms: Decision Tree, Random Forest, K-Nearest Neighbors (KNN), Naive Bayes, Logistic Regression, Support Vector Machine (SVM).
  • Regression: To predict a continuous numerical value.
    • Examples: What is the predicted price of this house? How many kilograms of maize will this plot likely yield? What is the estimated temperature tomorrow?
    • Common Algorithms: Linear Regression, Ridge Regression, Lasso Regression, Support Vector Regression (SVR).

How do you choose? Often, you might research what algorithms work well for similar problems, and scikit-learn makes it fairly easy to train and compare several different algorithms on your data to see which performs best.

Section 5: Checking the Score (Evaluating Your Model)

How good is your trained model? You need to evaluate it using your test set. Key concepts here are:

  • Bias vs. Variance:
    • High Bias (Underfitting): The model is too simple and misses the underlying patterns in the data. It performs poorly on both the training data and the test data.
    • High Variance (Overfitting): The model is too complex and learns the training data too well, including noise. It performs extremely well on the training data but poorly on the unseen test data.
    • Goal: Find a model with a good balance – low bias and low variance – that generalizes well to new data.
  • Metrics: scikit-learn provides tools to calculate performance scores using your test set:
    • Accuracy: Overall percentage of correct predictions.
    • Precision: Out of all the times the model predicted “Yes”, how often was it actually “Yes”? (Good for minimizing false positives).
    • Recall: Out of all the actual “Yes” cases, how many did the model correctly identify? (Good for minimizing false negatives).
    • F1 Score: A balance between Precision and Recall.
    • (And others like Specificity, Confusion Matrix…)
    • By comparing these metrics for different algorithms (or different settings of the same algorithm), you choose the best-performing model for your specific needs.

Section 6: Putting it Online (Exporting Model & Streamlit)

Once you have a trained model you’re happy with in your Jupyter Notebook:

  1. Export/Save: You save the trained model (using Python tools like pickle or joblib) and any necessary preprocessing steps (like data scalers) to files.
  2. Build Web App (Streamlit): Streamlit is a Python library that lets you build interactive web interfaces quickly using Python scripts. Your Streamlit app script will:
    • Load the saved model file(s).
    • Create user interface elements (sliders, text inputs, buttons) for the user to enter data relevant to the prediction (e.g., age, blood pressure).
    • Take the user’s input.
    • Apply the exact same preprocessing steps to the user’s input that were applied to the training data.
    • Feed the preprocessed input into the loaded model.
    • Get the prediction from the model.
    • Display the prediction clearly to the user on the web page.

Section 7: Let’s Code (Activities – Stroke Risk App)

Prerequisites Check: Does your team have experience with Python, Pandas, Scikit-learn, and using environments like Jupyter Notebooks and potentially VS Code / GitHub Codespaces / Google Colab? If yes, proceed! If not, focus on the simpler AI platforms might be better for ICT Club.

  • Activity 1 (Jupyter Notebook – Model Training):
    • Mission: Train and evaluate models to predict stroke risk.
    • Task: Download the Kaggle stroke dataset. Follow the detailed video tutorial provided in the lesson (choose the one for your setup – local or online) to:
      • Load and explore the data using Pandas.
      • Preprocess/clean the data.
      • Split into training/testing sets.
      • Train several different classification models using scikit-learn.
      • Evaluate their performance using metrics.
      • Choose the best model and export/save it (and any scaler used).
    • Challenge: Try training an additional algorithm not shown in the video and compare its performance.
  • Activity 2 (Streamlit – Web App):
    • Mission: Build a web app that uses your trained model.
    • Task: Follow the second video tutorial (local/VS Code or online/Codespaces) to:
      • Create a Python script for your Streamlit app.
      • Load the exported model and scaler.
      • Build the user interface with Streamlit widgets (e.g., st.slider, st.selectbox, st.button).
      • Get user input, preprocess it, make a prediction using the loaded model, and display the result (e.g., “Predicted Stroke Risk: Low/High”).
    • Challenge: Try building a similar Streamlit app for the Iris flower classification dataset.

Section 8: Is This Python Path Right for Your Team?

Seriously consider:

  • Do you have team members comfortable with Python programming?
  • Can you set up the required environment (Jupyter, Streamlit, libraries)?
  • Do you have enough time within the ICT Club schedule to handle both the ML coding and building the Streamlit app, alongside all other submission requirements?

This approach offers great power and flexibility but is significantly more complex and time-consuming than using platforms like Teachable Machine. Choose wisely based on your team’s skills and the time remaining.

Part 9: Quick Review (Key Terms)

  • Preprocessing: Cleaning and formatting data for AI algorithms.
  • Scikit-learn: A key Python library for machine learning tasks (splitting, training, evaluating).
  • Streamlit: A Python library for creating web apps easily.
  • Classification Algorithm: Predicts categories (e.g., Yes/No).
  • Regression Algorithm: Predicts numerical values (e.g., Price, Temperature).
  • Bias vs. Variance: Trade-off in model complexity; aiming for a balance.
  • Overfitting: Model learns training data too well, fails on new data (high variance).
  • Underfitting: Model too simple, fails on both training and new data (high bias).
  • Exporting Model: Saving the trained AI model to a file (e.g., using pickle).

Part 10: More Resources

The lesson provides links to further tutorials on Machine Learning with Python/scikit-learn and building apps with Streamlit.

Conclusion

Webale Kwegattako! (Thank you for joining in!) This lesson introduces a powerful, code-based approach to building and deploying AI models. It requires strong Python skills but offers immense flexibility.

If your team is equipped for it, working through the stroke risk prediction activities will be a fantastic learning experience, covering the end-to-end data science process from raw data to a predictive web app.

If this path seems too complex right now, that’s perfectly okay – focus on mastering the tools and platforms that best fit your current skills and the ICT Club goals. Mugende Mumirembe! (Go in peace / Go well!)

Leave a Reply

Your email address will not be published. Required fields are marked *