2008-05-06

An intro to SPR's electoral roll data table

How is SPR's electoral roll organized?

SPR distributes the electoral roll that changes roughly every quarter for a total charge of about RM44,000 (0.4 sen per voter name). The data can be obtained in these formats: Hard copy, PDF softcopy, or MS Access database files.

The PDF and hard copies are arranged in a printer-friendly "report" - essentially smartly-designed output from the database. The data in the PDF and hard copy format are split into parliamentary constituencies, then state assembly seats (DUN), then voting districts (Daerah Mengundi, DM). Within each DM the voters (up to a few thousand) are assigned their temporary NoSiri, from the lowest to the highest IC number within that DM.

PDF file structure


Parliamentary seat



DUN seat



DM or Daerah Mengundi



Voters by NoSiri (new IC)

The MS Access formatted data file is a large flat table of data, comprising 17 columns of data, and as many rows as there are voters. Instead of one big file of 10.9 million voters, the data are usually split into individual files for each state or federal territory. This blog entry discusses the outline of this flat table.

This is a screenshot of the 17-column flat data table. This view was created in a program called phpMyAdmin, but any other database interface programs will have a similar outline.











SPR's electoral roll data gazetted for 2008 Feb 5 contains 10,922,139 rows or records, or number of voters. Presented in the MS Access database format, it has 17 fields (columns) of information per voter. The 17 fields are shown below, with MALVU's explanation.

Table: SPR Electoral Rolls Contains 17 Columns of Voter Data
FIELD NAME DATA TYPE MALVU EXPLANATION
NoSiri Number Long Integer Serial no. for voter within DM, for this database report
IC Text 12 National identification number
ICLama Text 12 old IC
Nama Text 50 Voter name
NamaSpouse Text 50 Spouse name, mostly blank
ICSpouse Text 50 Spouse IC, mostly blank
NoRumah Text 50 House number, within localiti
Jantina Text 50 Gender, L for male or P for female
Kodlokaliti Text 50 10-digit locality code: PPPSSDDLLL
  • PPP for parliamentary code, eg, 049
  • SS for state assembly code, or DUN code, eg, 14
  • DD for polling district code, or DM code, eg, 05
  • LLL for locality code, eg, 007
NamaLokaliti Text 50 Name of locality
NamaParlimen Text 50 Name of parliamentary constituency
NamaDM Text 50 Name of polling district (Daerah Mengundi)
NamaDUN Text 50 Name of state assembly seat (Dewan Undangan Negeri)
Negeri Text 50 Name of state
Tahunlahir Text 4 Year of Birth
TM Text 250 Polling center (Tempat Mengundi), usually a school address
Saluran Number Decimal Polling stream, usually a room in a school

Further comments:
  • NoSiri is a temporary number assigned to each voter, according to his or her listing order within the voting district (daerah mengundi, DM), for a particular update of the database. It runs from 1 to a few thousands. Within each DM, voters are arranged from the lowest to the highest new IC number, and then each is assigned a NoSiri. NoSiri will change for each edition of electoral roll, because a new registrant, eg, may have an IC that is inserted anywhere within the DM list.

  • IC, or the new IC number, is always 12-digit in length, and is the most important column.
  • SPR has apparently made IC a "unique key," which in database parlance, means there can be only one occurrence of any IC. SPR's database program probably ensures that duplicate "exact" IC input cannot be accepted.
  • The IC column cannot contain NULL or blank value. SPR has apparently programmed its database to always have a 12-digit number under the IC field.
  • Our guess is that around mid-2007 or whereabout, SPR had gone through a "cleaning" campaign to eliminate all voters without new IC number, for reason of incomplete data ( Penandaan Pemotongan - Pengenalan Diri Tidak Lengkap, or MALVU code 31)

  • Duplicate IC? There is a persistent question by election monitoring NGOs: If we searched the entire nation's voter database, will we find one voter registered in 2 places? IC is the most natural search criteria because of its uniqueness.
  • We have searched the entire 10.9 million+ country-wide electoral roll of 2008 Feb 5, and found zero duplicate "exact" IC.
  • However, the electoral roll does contain at least 770 sets of "twin" or similar ICs, which we will cover later in another blog entry. An example is the co-existence of something like T0123456 and T123456.
  • This means any duplicate registration of voters - if there is any - will have to be done through different or variations of IC numbers.

  • Objection, possible duplicate from delayed gazzette? How does SPR handle duplicity that can arise from delayed-gazette of some constituencies? To illustrate this problem, consider this. Say IC#123456789012 has moved out of parliamentary seat P049 and into P048 (a few kms away) during 2007q4. However, P049-2007q4 roll was disputed and not gazetted in 20080205 roll; P049-2007q3 was gazetted instead. Now potentially we have this voter listed in P049-2007q3 and P048-2007q4, both gazetted for electoral roll 20080205. It seems SPR's solution is to delete IC#123456789012 from one of the quarter roll, before merging it into the gazetted 20080205 roll, to avoid duplicate. We will cover this topic in another blog entry.

  • ICLama, or old IC Number is less important but helps confirm identity. It can be left blank.
  • There are a number of errors and inconsistencies in ICLama, which we will list in the next blog entry on old IC numbers.

  • Nama is the data field / column for input of voter name. There is some inconsistencies in SPR input format (or perhaps it was NRIC department input) such that Malay middle name variations such as BIN, B, BN., BINTI, BT, BINTE, BTE., etc do give rise to some duplicate registrations. We will cover this topic in another blog entry later.

No comments: