This is an old revision of the document!

OVERVIEW

The Teacher Demand and Supply Model (TDSM) supports Jordan's Ministries of Education (MoE) and Higher Education and Scientific Research (MoHESR) to make data-driven decisions and prepare multi-year supply and demand projections for grade K-12 teachers of Jordanian students in the Kingdom's public schools. The model uses data entered into the Open Education Management Information System (OpenEMIS¹⁾) to calculate teacher surpluses and shortages in each school from 2016 to five years in the future, allowing policymakers to recognize trends and shifts, and incorporate those into their considerations for policies and incentives impacting teacher recruitment and retention. TDSM will support stakeholders across the education sector in their efforts to plan strategically; it will inform the MoE, *Civil Service Bureau (CSB)*, and university partners about where teachers are going to be needed and for which subjects. The TDSM workflow is depicted in the illustration below.

USER GUIDE

How to visualize the data

Browse to the Teacher Demand and Supply Model https://data.emis.moe.gov.jo/tdsm/

Charts

Choose from Filter choices: Click on the 'x' to remove individual filtered choices or click 'Clear Filters' to remove all

Region (Multi-select option) (North, Middle or South) cascading choices can be filtered based on Region.
Governorate (Multi-select option)
Directorate (Multi-select option)
Liwa (Multi-select option)
School (Multi-select option)
Cohort (K-3, Female 4-12, Male 4-12) (Multi-select option)
Specialization (Arabic is the default, no multi-select due to duplicate count of teachers)

Choose Group By preference:

All (of Jordan)
Region
Governorate
Directorate
Liwa (District)
Specialization
Cohort

Choose Years to forecast and compare. Years before the current year are ACTUAL data. Years after the current year are FORECASTS.

Click on a Year to add to the visualization (highlighted in BLUE)
Click on a Year to remove from the visualization (no highlight)
To show Years in chronological order in the chart, choose them in chronological order.

Choose visualization preference:

Teacher Supply/Demand Chart (default)
Line Chart
Teacher Excess/Teacher Needed Chart
School Map

Scroll down to the Data Table to see more details.

School Map

Filtered by Specialization - 'Arabic' by default Choose filters to display schools on the map. Click on the numbers on the map to drill down to geographic area. Continue to click on the numbers to get to the school blue points Click on a school blue point to get the school level data on Teachers Needed

How to download the data

Users can download the TDSM data as an Excel spreadsheet. To do so, scroll down the TDSM page until you see the “Download data as Excel” button at the top of the data table. Click on this button and save the file to your computer.

Data dictionary

The TDSM data download contains the following columns:

School name: The name of the school. Values in Arabic only.
Region: The region the school is in. Values in Arabic and English.
Governorate: The governorate the school is in. Values in Arabic and English.
Liwa: The liwa the school is in. Values in Arabic and English.
Directorate: The educational directorate the school is in. Values in Arabic and English.
Latitude: The latitude of the school.
Longitude: The longitude of the school.
Cohort: One of three groups of students: kindergarteners through third-graders (K-3), females in fourth through twelfth grade (Female 4-12), and males in fourth through twelfth grade (Male 4-12). Values in Arabic and English.
Specialization: The teacher specialization. Kindergarten teachers have a specialization of “Kindergarten”, first- through third-grade classroom teachers have a specialization of “Classroom_123”, and all other teachers are assigned their subject as their specialization. For a list of all specializations, see the specializations_crosswalk.csv lookup table. Values in Arabic and English.

The following columns are repeated for each year from 2016 to five years in the future, with the year indicated in parentheses after the column name. For example: “Teachers(2016)”, “Teachers(2017)” … “Teachers(2026)”, “Teachers(2027)”.

Students: For past years, the actual number of students in the cohort and school, for future years, the projected number of students in the cohort and school.
Sections: For past years, the total number of actual sections for all grades and tracks in the cohort and school. For future years, the total number of sections at the school during the most recent past year, distributed across grades, tracks, and gender so that the largest section is as small as possible. Therefore, while the total number of sections at a school remains constant during all future years, the number of sections in a given cohort at the school may change from one future year to the next. For a list of all grades and tracks see the grades_crosswalk.csv lookup table.
Maximum section size: The number of students in the largest section in that cohort and school that year. Section size is calculated as the number of students divided by the total number of sections for each school, grade, track (and gender for grades 4-12). Section sizes may therefore be non-whole numbers (e.g., 29.5, 32.33).
Lesson hours needed: The number of sections for each grade and track within the cohort at the school multiplied by the number of lessons required per week for the specialization, summed together. See the weekly_classes_per_specialization.csv lookup table for the number of lessons required for each specialization, grade, and track.
Teachers: For past years, the actual number of teacher FTEs (full-time equivalents) for the specialization at the school. For future years, the projected number of teacher FTEs for the specialization at the school. In cases where a teacher taught more than one specialization in the same year, their FTE is apportioned proportional to the number of sections they taught in each specialization. Part-time teachers contribute a corresponding partial FTE.
Teacher weekly maximum capacity: The estimated number of hours each full-time teacher can teach per week. For past years, this value is calculated by taking the the median of the lesson hours needed per teacher FTE per school within that directorate and specialization. Decimal values are rounded up to the next integer. Values over 24 are capped at 24. For future years, teacher weekly maximum capacity is the value from the most recent past year for the same directorate and specialization.
Teachers needed: [Lesson hours needed] divided by [Teacher weekly maximum capacity].
Teacher surplus or shortage: [Teachers] minus [Teachers needed]. Positive values indicate a surplus of teachers; negative values indicate a shortage.

Frequently Asked Questions

QUESTION: Why can't I select ALL or MULTISELECT for SPECIALTIES? ANSWER: If specialties are grouped together (multi select) or (all) then the surpluses and shortages are obscured which defeats the task of the TDSM. Example: if all specialties are grouped together then a school with an excess of 2 teachers for Arabic and 2 for English but needs 2 teachers each for Math and Science, then the TDSM would return a prediction of 0 teachers needed when in reality the school needs 4 teachers. At a larger scale if Jordan needs 100 Arabic Teachers and has 100 too many English teachers, they'd cancel each other out if grouped with 'All'. QUESTION: Why does the prediction show a pattern of exponential or ascending needs/overage for future years? ANSWER: The model is using the previous years data to calculate the prediction going forward.

ADMINISTRATOR GUIDE

How to refresh the forecasts

Running the Script

Follow these steps to run the script:

1. Navigate to the data-model

cd path/to/data-model

2. Create and Activate Virtual Environment (if not already present):

If you haven't created a virtual environment, you can create one using the following command:

python -m venv venv

Replace venv with the desired name for your virtual environment.

Then, activate the virtual environment:

source venv/bin/activate

3. Install Dependencies from requirements.txt: If you haven't already installed the required packages, use pip to install them from the requirements.txt file:

pip install -r requirment.txt

4. Run the Script:

Execute the Python script data-module/tdsm.py. to fetch the data. This will take 30-40 minutes depending upon your internet connection speed. Give at least 1 hour to complete this task.

python data-module/tdsm.py

You will then be promted with the following the following questions, after which TDSM will fetch all new data from OpenEMIS.

Enter password or leave blank to use the default password: 
Enter username or leave blank to use the default username: 
Enter API key or leave blank to use the default API key:

You will then be prompted with the following:

Enter the beginning year of the academic period for which projections should start or leave blank to use [current year].

For example, if you want TDSM to use all data through the 2023/2024 year to make projections from 2025 onward, you would enter 2025.

The TDSM program will then generate all data and forecasts.

5. Moving Data After running the script and generating the data, you may need to move data from one directory to another. This step is not specific to the script and can be performed using standard file management techniques. This list of files should be moved from data-model/content under src/js folder

lkp_location.txt
lkp_cohort.txt
lkp_specialization.txt
lkp_school.txt

Make sure you have the necessary permissions and use commands like mv (on Unix-like systems) or move (on Windows) to relocate the files as needed.

The TDSM back end will then fetch the latest data from OpenEMIS if you told it to, recalibrate the models, and generate new forecasts. Wait until you see “UPDATE COMPLETE” in the ouput window.

Navigate to the TDSM front end to view the updated forecasts.

How to archive a version

Inside the project folder is file called “archive.sh”, this script will archive the old data for the project create a record for the archive in archive list. Run archive.sh from your terminal, script will ask for the name for archive, script will move all old data to a new folder under Archives

How to add and modify teacher specializations in TDSM

Whenever TDSM receives a new subject during an OpenEMIS dowload, it automatically associates it with an existing teacher specialization. If TDSM's mapping rules do not apply to the new subject, “Other” is used as the default specialization. The list of all subject-to-specialization mappings is stored in the subjects.csv file. To change the mapping of a given subject, modify the corresponding value in the specialization column of subjects.csv. The specialization can be one that already exists in the file or a new specialization. TDSM does not overwrite existing subject x specialization mappings and will apply this new mapping to all data the next time it is updated.

How to add a new student track to TDSM

To add a new student track, e.g., Eleventh grade engineering, to TDSM, first check to see that the OpenEMIS grade label is already contained in the grade_mappings lookup table. If not, add the new grade mapping to this file. For example, you would add “Eleventh grade” in the name column and “11” in the grade column. Next, add the new OpenEMIS track labels, separated by commas, and the corresponding suffix to the the track_mappins lookup table. For example, for engineering, you might add “engineering,هندسة” to the names column and “eng” to the track column. Finally, you would create a new row in the weekly_classes_per_specialization lookup table, with “11_eng” in the grade column and the number of hours required under each specialization column. For specializations where instruction was not require, leave the column blank.

How to update required class hours for a student track

The weekly hours of required instruction for each student track are contained weekly_classes_per_specialization lookup table. To adjusted the required hours of instruction for a given student track, modify the values in this file. Leave the cells blank for specializations where no instruction is required.

How to add new OpenEMIS locations to TDSM

Each directorate is uniquely identified in OpenEMIS by an area_id. A list of all OpenEMIS area_ids can be found in the areas.csv table, which is updated each time the TDSM data are refreshed. Additionally, each school in OpenEMIS is given a more granular administrative_area_id. Administrative_area_ids are nested within area_ids. Because OpenEMIS does not have unique identifiers for liwas, each known administrative_area_id+area_id pair has been mapped to a liwa in the administrative_area_crosswalk.csv file. If an administrative_area_id is not mapped in this table, any school with that administrative_area_id will not be included in TDSM. All unmaped schools are listed in the unmapped_schoools.csv file. To add a new administrative_area_id+area_id pair to TDSM, add a new row the administrative_area_crosswalk.csv file, filling in a value for each column. The. next time the TDSM data are refreshed, the newly mapped administrative_area_ids will then be included.

How to refine the forecasting models

The TDSM models automatically recalibrate whenever new data are downloaded from OpenEMIS. However, the models can be further refined by changing the model parameters or the algorithm itself. This can be done by modifying the TDSM back-end code (data-module/tdsm.py) which is written in Python version 3.10.12. See the forecasting models section for details on the models.

TECHNICAL SPECIFICATIONS

Version history

TDSM version 1.0 was deployed February 2022. Version 2.0 is in final testing and scheduled to deploy on…

	TDSM 1.0	TDSM 2.0
Data feed	Files manually generated from OpenEMIS and sent to TDSM	Automatically pulled from OpenEMIS via its API
Granularity	Directorate-level	School-level
Teachers needed calculation	Based on user-defined class size	Based on number of sections in each grade, track, and school
Forecasting algorithm	Ridge regression	Extreme gradient boosting
Visualizations	Bar Chart, Pie Chart, Line Chart, Surplus/Shortage Chart	Teacher Demand/Supply Chart, Line Chart, Teacher Excess/Teacher Needed Chart, Schools Map

Forecasting models

TDSM predicts the count of Jordanian students and civil service teacher FTEs in public schools for the next five years. On execution, TDSM retrieves all student and teacher records from 2016 onward from OpenEMIS, recalibrates its models, and generates fresh forecasts. Forecasts include only schools administered by the Ministry of Education and exclude Syrian students and contract teachers.

The student and teacher forecasting models use a machine learning algorithm called extreme gradient boosting. Both models are generated by the Python implementation of XGBoost version 1.7.6. Function call: XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.5, max_depth = 5, alpha = 100, reg_lambda=100, n_estimators = 500). Below are the model predictors (features) and their relative importance in each model as of April 2024. As the models are recalibrated on additional data over time, the importance of each predictor may change.

The x-axis in the figures above denotes the F score, or how often the model used each predictor to make its forecasts. Predictors with higher F scores were used more often and therefore were more important to the model.
students_prev_year is the number of students at the same school and in the same grade the previous year.
students_prev_year_and_grade is the number of students at the same school in the previous grade the previous year. For kindergarteners and first-graders, this value was set to students_prev_year.
grade_cat is a categorical variable representing the grade (K-12) of the students. For grades 11 and 12, grade_cat also includes the track, for example, 11th grade agriculture track or 12th grade science track.
governorate_en_cat is the governorate in which the students' or teachers' school is located.
directorate_en_cat is the directorate in which the students' or teachers' school is located.
gender_cat is the gender of the students or teachers.
specialization_en_cat is the teachers' area of specialization. All Kindergarten teachers were assigned “Kindergarten” as their specialization and all grade 1-3 teachers other than English and French teachers were assigned “Classroom_123” as their specialization. English and French teachers and all teachers in grades 4-12 were assigned their specific specialization, such as Arabic or Mathematics.
fte_prev is the sum of all teacher full-time equivalents (FTEs) in the previous year for that specialization and school. fte_prev2 is the same value from two years prior. Teachers who taught multiple specializations during the same school year had their FTE split across specialization proportional to the number of classes they taught in each specialization. Civil service teachers working less than a full FTE in a given year contributed the appropriate partial FTE.
age_prev is the median age in years of the teachers in that specialization and school during the previous school year. Individual teacher ages were bottom-coded at 21 and top-coded at 90. For projections 2+ years in the future, prev_age was extrapolated using extreme gradient boosting, with specialization, gender, and median age from the past two years as predictors.
pandemic is a categorical variable with three possible values: 2020/21, 2021/22, and all other years.

Lookup Tables

The following tables are used by TDSM to compile data and generate forecasts.

weekly_classes_per_specialization.csv: Specifies the weekly hours of instruction each grade should receive in each specialization. View file For any grade + stream combination not included in this file, TDSM defaults to the values listed for that grade without any stream. All current grade + stream combinations are listed below.
- G-1: Pre-Kindergarten
- G0: Kindergarten
- G1: First grade
- G2: Second grade
- G3: Third grade
- G4: Fourth grade
- G5: Fifth grade
- G6: Sixth grade
- G7: Seventh grade
- G8: Eighth grade
- G9: Ninth grade
- G10: Tenth grade
- G11: Eleventh grade (default values for eleventh graders who are not in one of the streams listed below.)
- G12: Twelfth grade (default values for twelfth graders who are not in one of the streams listed below.)
- G11_sci, G12_sci: Eleventh and twelfth grade science stream
- G11_ind, G12_ind: Eleventh and twelfth grade industrial stream
- G11_lit, G12_lit, Eleventh and twelfth grade literature stream
- G11_home, G12_home, Eleventh and twelfth grade home economics stream
- G11_ag, G12_ag: Eleventh and twelfth grade agriculture stream
- G11_hotel, G12_hotel: Eleventh and twelfth grade hospitality stream
grade_mappings.csv Used by TDSM to translate grade labels from OpenEMIS into grade integers, e.g., “Tenth grade” maps to the value 10.
track_mappings.csv Used by TDSM to tranlsate track labels from OpenEMIS into standardized track suffixes in TDSM. For example, “agriculture” maps to the suffix “ag”.
area_translations.csv: Provides a crosswalk between OpenEMIS area_id values and English labels.View file
administrative_area_crosswalk.csv: Links OpenEMIS administrative_area_id values with liwas and area_id values.View file
school_types.csv: Provides a lookup table of school types in OpenEMIS.View file

Staging Tables

The following tables are generated by TDSM each time it loads new data from OpenEMIS Jordan. These tables are persisted so that TDSM can be rerun without having to reload data from OpenEMIS.

areas.csv: List of all areas.
institutions.csv: List of all schools.
subjects.csv: List of all existing subjects and the teacher specialization they map to, Also includes the OpenEMIS ID of each subject and the OpenEMIS ID of the academic period during which the subject was first taught.
teacher_subjects.csv: All teachers by subject and year.
teacher_positions.csv: Yearly teacher positions.
staff_types.csv: List of all staff position types.
users.csv: Teacher demographics.
students.csv: All student data.
grades.csv: List of all grade values.
academic_periods.csv: List of all academic periods and their corresponding OpenEMIS ID.

Data files

The following files are generated by TDSM for use by the user interface.

tdsm.txt: Contains all student and teacher data. One record per school per cohort per specialization per year.
lkp_location.txt: Lookup table for location details.
lkp_cohort.txt: Lookup table for cohort values.
lkp_specialization.txt: Lookup table for specialization values.
lkp_school.txt: Lookup table for school details.
unmapped_schools.csv: Lists all schools from OpenEMIS that were not successfully mapped to a liwa. These schools are not included in the TDSM data or projections. To include these schools in TDSM, see How to add new OpenEMIS locations to TDSM

¹⁾

OpenEMIS (https://www.openemis.org) is an open-source education management information system initiated by UNESCO and used by the Kingdom of Jordan to track its students and teachers.

Table of Contents