- Published on
Thesis Thursday 3 - Model and Methodology
- Authors
- Name
This week's Thesis Thursday will be a two part special, partly to make up for last week's missing post. I also want to take the opportunity to document some thoughts and progress. This post is arguably the more technical of the two and focuses on issues relating to the model and methodology of my paper. In the subsequent post I plan to document some analysis on the recipe dataset which I think is very interesting (maybe even more so then the actual thesis itself).
Nonetheless, the model is interesting in its own right and worth thinking through in greater detail. Modelling the effect of immigration on consumption patterns gave me some new insights and additional things to think about.
Model
I will skip the mathematical exposition here and assume some knowledge of economics and functional forms. Modelling the demand of supermarket produce or any good in general is an extremely difficult problem. The main issue being that most models involve too many parameters that cannot be identified and taken easily to the data. Even simple models taught in intermediate micro courses often involve a large matrix of elasticities of substitution and income elasticities that make practical implementation near impossible.
Rather than go down the route of BLP demand models which alleviate the above problem by treating goods as bundles of characteristics and is thus frequently used in marketing or industrial organisation papers, I decided to opt for a simpler and more restrictive model - the constant elasticity of substitution (CES) model. Using a nested CES model allows me to abstract from other less relevant details like the estimation of elasticities. Instead, by assuming that consumers divide their income across classes of goods based on some preference parameter, I can concentrate on the share of income spent within each particular product group.
Nesting the preference of local or foreign products within each class of good, also allows me to focus on foreign consumption share within each product group. As conceptualised in my model, an increase in the number of immigrants from country j increases the preference parameter for goods related to country j, resulting in a positive correlation between migrant flow from country j and consumption share of goods related to country j.
After writing down the model I realise that there are a few additional complications. First, the model assumes that prices are held constant i.e. migrant flow only affects preferences but not prices. This is quite a strong assumption but one that could potentially be checked against the data.
Next, it also assumes that the types of goods available are constant across periods. This seems unlikely to hold since one would imagine that retailers would stock more foreign related goods to cater to migrant communities. Under the CES preference structure, any supposed positive correlation could simply be due to love for variety rather than preference spillovers. One possible way to distinguish between the two is to look for expenditure on products that are available in all areas. If the effect of migrants from country j operates on consumption share solely through product variety, the actual expenditure on products related to country j available in all regions should be lower in areas with high migrant inflow. On the other hand, if the effect operates through preference spillovers, the expenditure share would be higher. In reality both channels should play a role and the direction potentially ambiguous.
Methdology
The baseline model is a regression of consumption share on foreign related goods on foreign born share at the county(i)-geography(j) level:
where and control for unobserved variation at the county or migrant-country level that might be correlated with both the level of foreign born and consumption share. , the coefficient of interest gives us the effect of an increase in foreign born share on consumption share. Under the reasoning outlined in the model section, should be positive.
Given that I am working on a dataset which provides cross-county variation in consumption share in a particular year, one might wonder what the relevant choice of foreign-born share would be. If preference spillovers take some time to materialise, using contemporaneous foreign-born share might not be the best choice. One would still expect it to be correlated since the current share of foreign-born is dependent on the past-share. Using a lag share might provide a better signal but the choice of lag is debatable. Including multiple foreign-born share terms would also not be insightful given that each foreign-born term is a function of another and the true effect is a combination of foreign share and time.1
Endogeneity concerns
For a factor to be considered an omitted variable, it has to vary at the ij level and be correlated with both foreign-born share and consumption-share of natives. While there are potentially many factors that affect the share of foreign born in a particular county, it is not immediately apparent that these factors would also shift the consumption patterns of locals.
One potential concern is that a county's attractiveness to migrants from a particular country might also attract the migration of natives who prefer goods from that country. This is a potential problem as I am unable to observe underlying preferences and internal migration behaviour, treating all U.S. citizens equally as natives.
An instrumental variable strategy could potentially be used to resolve endogeneity concerns. A commonly used instrument in the migration literature is the share of foreign-born in a distant lagged year. This "enclave instrument" proposed by Card has been shown to be correlated with contemporaneous migration flows but I am not certain it satisfies the exclusion restriction when applied to this particular problem.
The instrument should ideally affect contemporaneous consumption expenditure only through foreign-born share. However, using lagged foreign-born share is problematic as it may affect contemporaneous consumption directly. It would also be correlated with the number of restaurants opened by migrants muddying the identification strategy.
Footnotes
This issue seems impossible to resolve without fully specifying a model of how migrant flows affect local preferences across time. ↩