PeD revisited.
I have been contracted by RTI to revise and extend the 2007 research described in the Original report. The primary goals in this phase are (were) to:
- base quantities on the three month period prior to a price change and the three month period following a price change
- capture multiple price changes
- maintain time and chronological order
The original analysis took the mean quantity at each distinct price. This ignored confounding influence of other interventions and other trends over time. This analysis constrains the observation window to a period long enough to reduce the affect of anomalies in individual monthly reports yet short enough to minimize the effects of time. The length of the window is a parameter so we can investigate shorter and longer intervals.
The month in which the price change was observed is not included. Siince we do not know when during the month of observation the price change occurred, we cannot attribute the quantity to either the pre-price change or the post-price change.
We also wanted to observe the influence of multiple price changes to investigate the outcomes of one larger price change vs. multiple smaller price changes.
By record price change time, we can later investigate the effects of policy changes during the study period.
The methodology reuses the database work of the 2007 study including the monthly report data conversion and the matching of price and quantity variables. It diverges from the original at the main query on monthly reports. The original query used aggregate functions to obtain mean quantities.
select NGO,CLINIC,CLINIC_A as UR,\""+pv+"\" as SERVICE,"+
pv+" as PRICE,avg("+qv+")as QTY "+
"from monthlyreports group by CLINIC,PRICE order by CLINIC,PRICE
The new query
select NGO,CLINIC,MNTHYR,\"$row->p\" as SERVICE,
$row->p as PRICE,$row->q as QTY from monthlyreports order by NGO,CLINIC,MNTHYR
The origiinal query did most of the "heavy lifting" while the newer query produces individual observations, each set (ngo,ckinic,service) ordered on time.
Looking at a time sequence, the logic is intricate in detail but simple as a whole. We step through the times series comparing each observation with its predecessor (e.g. ngo0 and ngo.
We require three variabls to be "present" - NGO, CLINIC and QTY. The obervations without an identifying NGO and CLINIC would get "lumped" together and we don't know whether a "zero" quantity is an observed "zero" or missing data.
There are five conditions which imply five distinct "actions"
- ($ngo0 == $ngo) and ($clinic0 == $clinic) and ($p == $price)
- ($ngo0 == $ngo) and ($clinic0 == $clinic) and ($p == $price) and ($npc > 0) and ($nq1 < $mq)
- ($ngo0 == $ngo) and ($clinic0 == $clinic) and ($p == $price) and ($npc > 0) and ($nq1 == $mq)
- ($ngo0 == $ngo) and ($clinic0 == $clinic) and ($p <> $price) and ($nq0>=$mq))
- ($ngo0 <> $ngo) or ($clinic0 <> $clinic) or (($ngo0 == $ngo) and ($clinic0 == $clinic) and ($p <> $price) and ($nq0<$mq))
In ordinary langauge these are:
- ngo, clinic and price match
- ngo, clinic and price match and the number of prices changes is greater than 0 and the number of post price change quantity observations is less than the threshold
- ngo, clinic and price match and the number of prices changes is greater than 0 and the number of post price change quantity observations exactly equals the threshold
- ngo and clinic match but there is a price change and we have "n" pre price change samples
- there is a "new" ngo or clinic or a price change without "n" pre change samples
The corresponding actions are:
- preserve the last "n" pre price change quantities
- accumulate the first "n" post price change quantities
- calculate the mean post price change quantities and PeD.
- calculate the pre price change mean quantity, store the time and the pre price change price
- start a new subset of observations
The script(ped.php) is (was) written in PHP which I found more convivial for development than JAVA (the language used in the 2007 study). The complexity of conditional structure in early iterations was replaced by more complex boolean conditions. I found the Boolean analysis more convincing than the topology of if then statements.
The script is special purpose applyiing only to the current case. Perhaps if more of these come along, I will generalize the program. Two simple commands performed the "final" analysis:
./ped.php >mr.txt
grep ped mr.txt | cut --complement -f1 > ped.txt
Importing ped.txt back into MySQL, we query for descriptive measures.
SELECT service, count(ped),avg(ped), stddev(ped) FROM `ped`
group by service,npc order by service
| service | count(ped) | avg(ped) | stddev(ped) |
| ANC1LT | 223 | -0.14996521784235 | 8.5668542905221 |
| ANCRLT | 226 | 0.36414279297522 | 8.3576777563414 |
| ARI | 203 | -0.39803736422009 | 42.015795321498 |
| CDD | 160 | -2.4809770569671 | 31.974950686492 |
| COMDIS | 6 | 1.0730159189552 | 1.7260502350961 |
| CPAC | 2 | -0.0091666667722166 | 0.017500000540167 |
| EPICP | 51 | 5.1702612428104 | 62.746620243018 |
| FPCOUN | 15 | 0.25148147592942 | 0.59745159725326 |
| FPREM | 17 | 0.029411765144152 | 0.19195668424266 |
| HPBCHGSC | 14 | 2.7087301324521 | 3.6068696816243 |
| IMCIXAC | 216 | -0.23677821484981 | 17.20185918219 |
| INJCP | 158 | 8.9696202193965 | 29.719947223731 |
| IUDCP | 43 | 0.028914728483488 | 0.29041010425524 |
| LCC | 149 | 35.21438793808 | 76.719100489867 |
| NDEL | 10 | 0.0051555577432737 | 0.12569922243557 |
| NORCP | 22 | -0.047828282991594 | 0.3820732179818 |
| PNC1 | 240 | 0.50564814656197 | 4.3099363826239 |
| PNCR | 157 | -0.20311588294519 | 2.8781127139036 |
| REFL | 1 | 0.26666668057442 | 0 |
| RTISTI | 169 | 3.0256926880036 | 24.50525986015 |
| TTCP | 65 | -2.8055983143357 | 33.219544347266 |
| VTAMSLSC | 1 | 119 | 0 |
In the current version we can dissagregate by the price change number.
SELECT service, npc, count(ped),avg(ped), stddev(ped) FROM `ped` group by service,npc order by service,npc.
| service | npc | count(ped) | avg(ped) | stddev(ped) |
| ANC1LT | 1 | 173 | 0.20081554205195 | 9.2086387922784 |
| ANC1LT | 2 | 50 | -1.3636666470766 | 5.6586547610084 |
| ANCRLT | 1 | 174 | 0.71219121699018 | 8.3281487110598 |
| ANCRLT | 2 | 52 | -0.80048077969024 | 8.3509154773623 |
| ARI | 1 | 196 | -0.21072236634791 | 42.60923944366 |
| ARI | 2 | 7 | -5.6428573046412 | 18.184335613904 |
| CDD | 1 | 150 | -2.4391533130904 | 32.713876542412 |
| CDD | 2 | 10 | -3.1083332151175 | 17.463503424071 |
| COMDIS | 1 | 6 | 1.0730159189552 | 1.7260502350961 |
| CPAC | 1 | 2 | -0.0091666667722166 | 0.017500000540167 |
| EPICP | 1 | 48 | 6.793402629594 | 63.477473531744 |
| EPICP | 2 | 3 | -20.800000945727 | 41.767328398722 |
| FPCOUN | 1 | 15 | 0.25148147592942 | 0.59745159725326 |
| FPREM | 1 | 17 | 0.029411765144152 | 0.19195668424266 |
| HPBCHGSC | 1 | 14 | 2.7087301324521 | 3.6068696816243 |
| IMCIXAC | 1 | 168 | -1.0480515985262 | 16.237147728582 |
| IMCIXAC | 2 | 48 | 2.6026786280175 | 19.96107070713 |
| INJCP | 1 | 141 | 7.8056737344897 | 29.946715724202 |
| INJCP | 2 | 17 | 18.623529300094 | 25.818926135905 |
| IUDCP | 1 | 42 | 0.028968253749467 | 0.29384682023426 |
| IUDCP | 2 | 1 | 0.026666667312384 | 0 |
| LCC | 1 | 137 | 31.6372854888 | 70.061519243416 |
| LCC | 2 | 12 | 76.052974234025 | 123.40323192091 |
| NDEL | 1 | 9 | 0.034617285321777 | 0.094212254414737 |
| NDEL | 2 | 1 | -0.25999999046326 | 0 |
| NORCP | 1 | 22 | -0.047828282991594 | 0.3820732179818 |
| PNC1 | 1 | 185 | 0.16196696600197 | 4.1714659860231 |
| PNC1 | 2 | 55 | 1.661666662991 | 4.5598468632342 |
| PNCR | 1 | 139 | -0.14969166870758 | 2.6129566417843 |
| PNCR | 2 | 18 | -0.61566953733563 | 4.3971323843244 |
| REFL | 1 | 1 | 0.26666668057442 | 0 |
| RTISTI | 1 | 143 | 3.4966422546316 | 25.980010483609 |
| RTISTI | 2 | 26 | 0.4354700715496 | 13.530932409099 |
| TTCP | 1 | 53 | -1.6072851626097 | 33.303101017407 |
| TTCP | 2 | 12 | -8.0981480677923 | 32.320817816939 |
| VTAMSLSC | 1 | 1 | 119 | 0 |
The tab-delimited output was also imported into PSPP, an Open Source version of SPSS and saved as an SPSS compatible file (ped.sav).
Original report.
The task is to produce estimates of Price elasticity of Demand (PeD) from monthly reports by NGO clinics in the NSDP project containing monthly price and quantity data. The work was performed under contract with Research Triangle Institute (RTI) under the direction of Dr. Dennis Chao and funded by the US Agency for International Development (USAID). Hopefully thse estimates will contribute to a management plan for sustainability.
We began with an SPSS dataset (Pricing Dataset_Master_29 May 2007_cp.sav) of monthly reports consisting of 10376 observations of 231 named variables. This dataset was saved as a tdf file from within SPSS (monthlyreports.dat).
We used MySQL for data manipulation and Java to script the calculations. The results were saved as a TDF file (ped.tdf). We started by creating the database 'nsdpprice';
Using the information from SPSS using "Display Data Info", an SQL CREATE statement was created and executed using the MySQL Query Browser. The data was imported using the MySQL LOADDATA command.
Next "price" and "quantity" variable names were matched by inspection and used to create the MySQL table "pq". Price in most cases came from a variable name ending in CP. Pricing of SMC commodities was disaggregated by brand but the quantities were not. I simply chose one of the brands that suggested the most pricing activity.
We now have two MySQL tables - one containing monthly reports and another matching price and quantity variables names in the monthly reports table. The strategy is to pass the monthly reports table once for each pair in the price and quantity table. To do this we used a combination of MySQL and Java.
The java program flow is roughly:
- initialize the MySQL driver - com.mysql.jdbc.Driver
- connect to the database nsdpprice
- obtain an ordered set of price variable and quantity variable pairs
- for each price and quantity variable pair
- obtain an ordered set of {NGO, CLINIC,UR, SERVICE, PRICE, avg QTY }
- for each monthly report
- if UR, NGO, CLINIC and SERVICE match the previous record
- compute
- dq =difference in quantity
- dp = difference in price
- dpp = difference in price relative to averge price of the successive observations
- dqq = difference in quantity relative to average quantity
- ped=dqq/dpp
- write the estimate of ped to ped.tdf
Thus we have 3869 estimates of PeD, most of which some case for validity can be made. We can filter the exceptions in SPSS.
"ped.sav" is the final SPSS product. It was produced by importing the
file ped.tdf. After filtering the bogus data and taking the means of ped by service we have:
MEANS
TABLES=ped BY service
/CELLS MEAN COUNT STDDEV .
PeD | SERVICE | Mean | N | StdDeviation |
| ANC1LT | −0.064 | 292 | 1.121 |
| ANCRLT | 0.027 | 268 | 1.238 |
| ARI | −0.491 | 268 | 1.900 |
| CDD | −0.637 | 256 | 1.981 |
| COMDIS | −1.885 | 34 | 7.114 |
| CONGCP | −0.208 | 121 | 3.915 |
| CONPEN | −1.927 | 125 | 609.297 |
| CPAC | 0.715 | 2 | 1.577 |
| CSEC | 1.000 | 6 | 0.000 |
| EPICP | 0.050 | 79 | 0.628 |
| FPCOUN | 0.557 | 28 | 1.051 |
| FPREM | 0.143 | 40 | 1.864 |
| HOMEDE | 1.057 | 49 | 0.885 |
| HPBCHG | 0.466 | 14 | 0.162 |
| IMCIXA | −0.588 | 305 | 1.880 |
| INJCP | −0.234 | 233 | 3.100 |
| IUDCP | 0.196 | 56 | 1.260 |
| LCC | 0.623 | 229 | 1.428 |
| NDEL | 1.104 | 23 | 1.245 |
| NORCP | −0.571 | 40 | 2.293 |
| PILFEM | 6.478 | 130 | 378.521 |
| PILGCP | 0.342 | 111 | 0.626 |
| PNC1 | 0.087 | 295 | 1.399 |
| PNCR | −0.125 | 281 | 3.148 |
| REFL | 0.195 | 2 | 0.177 |
| RTISTI | −0.267 | 248 | 1.714 |
| STRL | 0.607 | 3 | 0.681 |
| TTCP | −0.319 | 45 | 0.769 |
| Total | 0.047 | 3583 | 134.229 |
Price increases were generally successful (> -1.0) except for COMDIS (Communicable diseases). CONPEN (Commercial condoms - Panther) and PILFEM.. CONPEN and PILFEM were suspect because of the inability to match price and variable data in the first place.
The next band of interest are those services where PeD is less than
0.0 but not less than -1.0. These are the cases where price increases
had the expected result of reducing quantity. The closer to -1.0 the
greater the impact. The social cost in these cases may exceed the
increased revenue.
The unexpected results, PeD greater than 0.0, may be explained by the
perception that those services were of more value at a higher price.
This study illustrates the use of multiple tools in the analysis of institutional and administrative data.