Ricardo’s Theory of Comparative Advantage

Old Idea, New Evidence


The anecdote is famous. A mathematician,

Stan Ulam, once challenged Paul Samuelson to

name one proposition in the social sciences that

is both true and non-trivial. His reply was: ‘Ri-

cardo’s theory of comparative advantage’; see

Paul Samuelson (1995, p. 22). Truth, how-

ever, in Samuelson’s reply refers to the fact that

Ricardo’s theory of comparative advantage is

mathematically correct, not that it is empirically

valid. The goal of this paper is to assess the em-

pirical performance of Ricardo’s ideas.

To bring Ricardo’s ideas to the data, one must

overcome a key empirical challenge. Suppose,

as Ricardo’s theory of comparative advantage

predicts, that different factors of production spe-

cialize in different economic activities based on

their relative productivity differences. Then,

following Ricardo’s famous example, if Eng-

lish workers are relatively better at producing

cloth than wine compared to Portuguese work-

ers, England will produce cloth, Portugal will

produce wine, and at least one of these two

countries will be completely specialized in one

of these two sectors. Accordingly, the key ex-

planatory variable in Ricardo’s theory, relative

productivity, cannot be directly observed.

This identification problem is emphasized by

Alan Deardorff (1984) in his review of empir-

ical work on the Ricardian model of trade (p.

476): “Problems arise, however, most having

to do with the observability of [productivity by

industry and country]. The…problem is im-

plicit in the Ricardian model itself…[because]

the model implies complete specialization in

equilibrium… This in turn means that the dif-

ferences in labor requirements cannot be ob-

served, since imported goods will almost never

∗ Costinot: MIT and NBER, Department of Eco-

nomics, MIT, 50 Memorial Drive, Cambridge, MA (e-mail:

costinot@mit.edu). Donaldson: MIT and NBER, Department of

Economics, MIT, 50 Memorial Drive, Cambridge, MA (e-mail:

ddonald@mit.edu). We thank Pol Antràs, Chang-Tai Hsieh, and

Esteban Rossi-Hansberg for comments and Meredith McPhail

and Cory Smith for excellent research assistance.

be produced in the importing country.” A simi-

lar identification problem arises in the labor lit-

erature in which the self-selection of individu-

als based on comparative advantage is often re-

ferred to as the Roy model. As James Heck-

man and Bo Honore (1990) have shown, if gen-

eral distributions of worker skills are allowed,

the Roy model—and hence Ricardo’s theory of

comparative advantage—has no empirical con-

tent. Econometrically speaking, the Ricardian

model is not nonparametrically identified.

How can one solve this identification prob-

lem? One possibility consists in making

untestable functional form assumptions about

the distribution of productivity across different

factors of productions and economic activities.

These assumptions can then be used to relate

productivity levels that are observable to those

that are not. In a labor context, a common strat-

egy is to assume that workers’ skills are log-

normally distributed. In a trade context, building

on the work of Jonathan Eaton and Samuel Kor-

tum (2002), Arnaud Costinot, Dave Donaldson,

and Ivana Komunjer (2011) have shown how

the predictions of the Ricardian model can be

tested by assuming that productivity levels are

independently drawn from Fréchet distributions

across countries and industries.

This paper proposes an alternative empirical

strategy that does not rely on identification by

functional form. Our basic idea, as in Arnaud

Costinot and Dave Donaldson (2011), is to fo-

cus on agriculture, a sector of the economy in

which scientific knowledge of how essential in-

puts such as water, soil and climatic conditions

map into outputs is uniquely well understood.

As a consequence of this knowledge, agrono-

mists are able to predict how productive a given

parcel of land, which will we refer to as a ‘field’,

would be were it to be used to grow any one

of a set of crops. In this particular context, the

econometrician therefore knows the productiv-

ity of a field in all economic activities, not just

those in which it is currently employed.



Our strategy can be described as follows. We

first establish how, according to Ricardo’s the-

ory of comparative advantage, total output of

various crops should vary across countries as

a function of: (i) the vector of productivity of the fields that countries are endowed with and

(i i) the producer prices that determine the al- location of fields across crops.1 We then com-

bine these theoretical predictions with produc-

tivity and price data from the Food and Agri-

culture Organization’s (FAO). Our dataset con-

sists of 17 major agricultural crops and 55 major

agricultural countries. Using this information,

we can compute predicted output levels for all

crops and countries in our sample and ask: How

do predicted output levels compare with those

that are observed in the data?

Our empirical results show that the output lev-

els predicted by Ricardo’s theory of compara-

tive advantage agree reasonably well with actual

data on worldwide agricultural production. De-

spite all of the real-world considerations from

which Ricardo’s theory abstracts, a regression

of log output on log predicted output has a (pre-

cisely estimated) slope of 0.21. This result is ro-

bust to a series of alternative samples and speci-


The rest of the paper is organized as follows.

Section I derives predicted output levels in an

economy where factor allocation is determined

by Ricardian comparative advantage. Section II

describes the data that we use to construct mea-

sures of both predicted and actual output. Sec-

tion III compares predicted and observed output

levels and Section IV offers some concluding re-


I. Ricardian Predictions

The basic environment is the same as in

Costinot (2009). We consider a world economy

comprising c = 1, …,C countries, g = 1, …,G goods, and f = 1, …, F factors of production. In our empirical analysis, a good will be a crop

and a factor of production will be a parcel of

land or ‘field’. Factors of production are immo-

bile across countries and perfectly mobile across

sectors. Lc f ≥ 0 denotes the inelastic supply of

1In line with Ricardo’s theory of comparative advantage, the

focus of our paper is on the supply-side of the economy, not

the demand-side considerations that would ultimately pin down

prices around the world.

factor f in country c. Factors of production are

perfect substitutes within each country and sec-

tor, but vary in their productivity A g

c f ≥ 0. Total output of good g in country c is given by

Q g c =

∑F f=1 A


c f L g

c f ,

where L g

c f is the quantity of factor f allocated

to good g in country c. The variation in A g

c f is

the source of Ricardian comparative advantage.

If two factors f1 and f2 located in country c are

such that A g2 c f2 /A

g1 c f2 > A

g2 c f1 /A

g1 c f1

for two goods

g1 and g2, then field f2 has a comparative ad-

vantage in good g2. 2

Throughout this paper, we focus on the

supply-side of this economy by taking producer

prices p g c ≥ 0 as given. We assume that the al-

location of factors of production to each sector

in each country is efficient and solves

max L


c f

{∑C c=1

∑G g=1 p

g c Q

g c

∣∣∣∑Gg=1 Lgc f ≤ Lc f } . Since there are constant returns to scale, a com-

petitive equilibrium with a large number of

profit-maximizing firms would lead to an effi-

cient allocation. Because of the linearity of ag-

gregate output, the solution of the previous max-

imization problem is easy to characterize. As

in a simple Ricardian model of trade with two

goods and two countries, each factor should be

employed in the sector that maximizes A g

c f p g c ,

independently of where other factors are being


Assuming that the efficient allocation is

unique,3 we can express total output of good g

2The present model, like the Roy model in the labor liter-

ature, features multiple factors of production. In international

trade textbooks, by contrast, Ricardo’s theory of comparative ad-

vantage is associated with models that feature only one factor of

production, labor. In our view, this particular formalization of

Ricardo’s ideas is too narrow for empirical purposes. The core

message of Ricardo’s theory of comparative advantage is not that

labor is the only factor of production in the world, but rather that

relative productivity differences, and not absolute productivity

differences, are the key determinant of factor allocation. As ar-

gued below, the present model captures exactly that idea. 3In our empirical analysis, 2 out of the 101,757 grid cells in

Brazil—the empirical counterparts of factors f in the model—

are such that the value of their marginal products A g

c f p

g c is max-

imized in more than one crop. Thus the efficient allocation is

only unique up to the allocation of these two Brazilian grid cells.

Dropping these two grid cells has no effect on the coefficient

estimates presented in Table 1.


in country c at the efficient allocation as

(1) Q g c =

∑ f ∈Fgc A


c f Lc f ,

where F gc is the set of factors allocated to good g in country c:


F gc =

 f = 1, …F | A g

c f

A g′

c f

> p



p g c

if g’ 6= g

 . Equations (1) and (2) capture Ricardo’s idea that relative rather than absolute productivity differ-

ences determines factor allocation, and in turn,

the pattern of international specialization.

II. Data

To assess the empirical performance of Ri-

cardo’s ideas we need data on actual output lev-

els, which we denote by Q̃ g c , as well as data to

compute predicted output levels, which we de-

note by Q g c in line with Section I. According to

equations (1) and (2), Q g c can be computed us-

ing data on productivity, A g

c f , for all factors of

production f ; endowments of different factors,

Lc f ; and producer prices, p g c . We describe our

construction of such measures here. Since the

predictions of Ricardo’s theory of comparative

advantage are fundamentally cross-sectional in

nature, we work with the data from 1989 only;

this is the year in which the greatest overlap in

the required measures is available.

We use data on both agricultural output (Q̃ g c )

and producer prices (p g c ) by country and crop

from FAOSTAT. Output is equal to quantity har-

vested and is reported in tonnes. Producer prices

are equal to prices received by farmers net of

taxes and subsidies and are reported in local cur-

rency units per tonne. Imperfect data reporting

to the FAO means that some output and price

observations are missing. We first work with a

sample of 17 crops and 55 countries that is de-

signed to minimize the number of missing ob-

servations.4 In the remaining sample, whenever

4The countries are: Argentina, Australia, Austria,

Bangladesh, Bolivia, Brazil, Bulgaria, Burkina Faso, Cam-

bodia, Canada, China, Colombia, Democratic Republic of the

Congo, Denmark, Dominican Republic, Ecuador, Egypt, El

Salvador, Finland, France, Ghana, Honduras, Hungary, Iceland,

Indonesia, Iran, Ireland, Israel, Jamaica, Kenya, Laos, Lebanon,

Malawi, Mozambique, Namibia, Netherlands, Nicaragua,

Norway, Paraguay, Peru, Poland, Romania, South Africa, Spain,

output data is missing we assume that there is

no production of that crop in that country. Sim-

ilarly, whenever price data is unreported for a

given observation, both quantity produced and

area harvested are also reported as zero in the

FAO data. In these instances, we therefore re-

place the missing price entry with a zero.5

Our data on productivity (A g

c f ) come from

version 3.0 of the Global Agro-Ecological Zones

(GAEZ) project run by IIASA and the FAO

(IIASA/FAO, 2012). We describe this data in de-

tail in Costinot and Donaldson (2011) but pro-

vide a brief description here; see also Nathan

Nunn and Nancy Qian (2009). The GAEZ

project aims to make agronomic predictions

about the yield that would obtain for a given

crop at a given location for all of the world’s

major crops and all locations on Earth. Data on

natural inputs (such as soil characteristics, water

availability, topography and climate) for each lo-

cation are fed into an agronomic model of crop

production with distinct parameters for each va-

riety of each crop. These models condition

on a level of variable inputs and GAEZ makes

available the output from various scenarios in

which different levels of variable inputs are ap-

plied. We use the scenario that corresponds to a

‘mixed’ level of inputs, where the farmer is as-

sumed to be able to apply inputs differentially

across sub-plots within his or her location, and

in which irrigation is available. It is important to

stress that the thousands of parameters that enter

the GAEZ model are estimated from countless

field and lab experiments, not from statistical re-

lationships between observed country-level out-

put data (such as that from FAOSTAT which we

use here to construct Q̃ g c ) and natural inputs.

The spatial resolution of the GAEZ data is

governed by the resolution of the natural in-

put whose resolution is most coarse, the climate

data. As a result the GAEZ productivity pre-

dictions are available for each 5 arc-minute grid

cell on Earth. The land area of such a cell varies

Suriname, Sweden, Togo, Trinidad and Tobago, Tunisia, Turkey,

USSR, United States, Venezuela, Yugoslavia and Zimbabwe.

The crops are: barley, cabbages, carrots and turnips, cassava,

coconuts, seed cotton, groundnuts (with shell), maize, onions

(dry), rice (paddy), sorghum, soybeans, sugar cane, sweet

potatoes, tomoatoes, wheat, potatoes (white). 5We have also experimented with replacing missing prices

by their world averages across producing countries adjusted for

currency differences. The empirical results in Table 1 are insen-

sitive to this alternative.


Figure 1: An Example of Relative Productivity Differences. Notes: Ratio of productivity in wheat (in tonnes/ha) relative to productivity in sugarcane (in tonnes/ha). Areas shaded white have either zero productivity in wheat, or zero productivity in both wheat and sugarcane. Areas shaded dark, with the highest value (“>12,033”), have zero productivity in sugarcane and strictly positive productivity in wheat. Source: GAEZ project.

by latitude but is 9.2 by 8.5 km at the Trop-

ics. The median country in our dataset contains

4,817 grid cells but a large country such as the

U.S. comprises 157,797 cells. Since the grid cell

is the finest unit of spatial heterogeneity in our

dataset we take each grid cell to be a distinct

factor of production f and the land area of each

grid cell to be the associated endowment, Lc f .

Hence our measure of the productivity of fac-

tor f if it were to produce crop g in country

c, A g

c f , corresponds to the GAEZ project’s pre-

dicted ‘total production capacity (tones/ha)’. We

match countries (at their 1989 borders) to grid

cells using GIS files on country borders from the

Global Administrative Areas database.

A sample of the GAEZ predictions can be

seen in Figure 1. Here we plot, for each grid cell

on Earth, the predicted relative productivity in

wheat compared to sugarcane (the two most im-

portant crops by weight in our sample). As can

be seen, there exists a great deal of heterogene-

ity in relative productivity throughout the world,

even among just two of our 17 crops. In the

next section we explore the implications of this

heterogeneity—heterogeneity that is at the core

of Ricardo’s theory of comparative advantage—

for determining the pattern of international spe-

cialization across crops.

III. Empirical Results

We are now ready to bring Ricardo’s ideas to

the data. To overcome the identification prob-

lem highlighted by Deardorff (1984) and Heck-

man and Honore (1990), we take advantage of

the GAEZ data, together with the other data de-

scribed in Section II, to predict the amount of

output (Q g c ) that country c should produce in

crop g according to Ricardo’s theory of compar-

ative advantage, i.e. according to equations (1) and (2). We then compare these predicted out- put levels to those that are observed in the data

(Q̃ g c ).

In the spirit of the ‘slope tests’ in the

Heckscher-Ohlin-Vanek literature, see Donald

Davis and David Weinstein (2001), we im-

plement this comparison by simply regressing,

across countries and crops, data on actual out-

put on measures of predicted output. Like Davis

and Weinstein (2001), we will assess the empir-

ical performance of Ricardo’s ideas by study-

ing whether (i) the slope coefficient in this re- gression is close to unity and (i i) the coeffi- cient is precisely estimated. Compared to these

authors, however, we have little confidence in

our model’s ability to predict absolute levels of

output. The reason is simple: the model pre-

sented in Section II assumes that the only goods

produced (using land) in each country are the

17 crops for which GAEZ productivity data are

available. In reality there are many other uses


of land, so the aggregate amount of land used to

grow the 17 crops in our study is considerably

lower than that assumed in our analysis. To cir-

cumvent this problem, we simply estimate our

regressions in logs.6 Since the core aspect of Ri-

cardian comparative advantage lies in how rel-

ative productivity levels predict relative quanti-

ties, we believe that a comparison of logarithmic

slopes captures the essence of what the model

described in Section I can hope to predict in this


Our empirical results are presented in Table 1.

All regressions include a constant and use stan-

dard errors that are adjusted for clustering by

country to account for potential within-country

(across crop) correlation in data reporting and

model misspecification. Column (1) contains

our baseline regression. The estimated slope co-

efficient is 0.212 and the standard error is small

(0.057).7 While the slope coefficient falls short

of its theoretical value (one), it remains positive

and statistically significant.

The fact that Ricardo’s theory of compara-

tive advantage does not fit the data perfectly

should not be surprising. First, our empirical

exercise focuses on land productivity and ab-

stracts from all other determinants of compara-

tive costs (such as factor prices that differ across

countries and factor intensities that differ across

crops) that are likely to drive agricultural spe-

cialization throughout the world. Second, the

fit of our regressions does not only depend on

the ability of Ricardo’s theory to predict relative

6In order to measure the gains from the economic integration

of U.S. agricultural markets between 1880 and 2000, Costinot

and Donaldson (2011) have developed a methodology that uses

additional data on aggregate land use to correct for this prob-

lem. Applying that correction is computationally challenging

here, due to the large number of fields in most countries, and

is beyond the scope of the present paper. 7In our logarithmic specification all observations in which

either output or predicted output are zero must be omitted. Out

of the total of 935 potential observations (55 countries and

17 crops), 296 have zero output and 581 have zero predicted

output—that is our Ricardian model predicts more complete spe-

cialization that there is in the data. This should not be surprising

given the potential for more spatial heterogeneity to exist in agri-

cultural reality than can be modeled (due to data limitations) by

GAEZ. In all, 349 observations have both non-zero output and

non-zero predicted output and are hence included in the regres-

sion in column (1). We have explored a number of potential

adjustments to correct the results in column (1) for these missing

observations, including a Tobit regression (where the coefficient

is 0.213 and the s.e. is 0.057) and adding one to all observations

prior to taking logs (coefficient 0.440; s.e. 0.031).

output levels conditional on relative productiv-

ity levels, but also on the ability of agronomists

at the GAEZ project to predict productivity lev-

els in each of 17 crops at 5 arc-minute grid cells

throughout the world conditional on the (coun-

terfactual) assumption that all countries share a

common agricultural technology.8 Third, while

the spatial resolution of the GAEZ predictions

is considerably finer than the typical approach

to cross-country data in the trade literature (in

which countries are homogeneous points), 5 arc-

minute grid cells are still very coarse in an ab-

solute sense. This means that there is likely to

be a great deal of potential within-country het-

erogeneity that is being smoothed over by the

GAEZ agronomic modeling. Yet despite these

limitations of our analysis, Ricardo’s theory of

comparative advantage still has significant ex-

planatory power in the data, as column (1) il-


Columns (2) and (3) explore the robustness of

our baseline estimate in column (1) to the inclu-

sion of crop and country fixed effects, respec-

tively. The rationale for these alternative speci-

fications is that there may be crop- or country-

specific tendencies for misreporting or model

error. Such errors may be economic in na-

ture if, say, some countries had higher intra-

national price distortions, or agronomic in na-

ture if, say, the GAEZ model predictions were

relatively more accurate for some crops than

others. Including such fixed effects can reduce

the slope coefficient (to as low as 0.097, in col-

umn (3)) but these estimates are still statistically

significantly different from zero. Thus the re-

sults in columns (2) and (3) show that Ricardo’s

theory of comparative advantage continues to

have explanatory power whether focusing on the

across-country variation, as in column (2), or the

across-crop variation, as in column (3).

Finally, columns (4) and (5) investigate the

extent to which our estimates are driven by par-

ticular components of the sample. Column (4)

estimates the slope only among the 28 countries

that are at or above the median in terms of agri-

cultural production (by weight). And column

(5) estimates the slope only on the 9 crops that

8The methodology developed in Costinot and Donaldson

(2011) uses data on harvested area to allow for and estimate un-

restricted crop-and-region productivity shocks. Again, because

of the high number of fields per country applying this correction

to the current paper is computationally challenging.


Table 1: Comparison of Predicted Output to Actual Output Dependent variable:

(1) (2) (3) (4) (5) log (predicted output) 0.212*** 0.244*** 0.096** 0.143** 0.273***

(0.057) (0.074) (0.038) (0.062) (0.074)

sample all all all majorcountries major crops

fixed effects none crop country none none observations 349 349 349 226 209 R-squared 0.06 0.26 0.54 0.04 0.07

log (output)

Notes: All regressions include a constant. Standard errors clustered by country are in parentheses. ** indicates statistically significant at 5% level and *** at the 1% level.

are the most important (by weight) in global pro-

duction. In both cases the estimated slope coef-

ficient is similar (within one standard error) to

our baseline estimate in column (1).

IV. Concluding Remarks

Ricardo’s theory of comparative advantage is

one of the oldest and most distinguished theories

in economics. But it is a difficult theory to bring

to the the data. To do so using conventional

data sources, one needs to make untestable func-

tional form assumptions about how productive a

given factor of production would be at the ac-

tivities it is currently, and deliberately, not do-

ing. In this paper we have argued that the pre-

dictions of agronomists—i.e., the scientists who

specialize in modeling how agricultural crops

would fare under a wide range of possible grow-

ing conditions—can be used to provide the miss-

ing data that make Ricardo’s ideas untestable in

conventional settings.

We have combined the data from a particu-

lar group of agronomists, those working on the

GAEZ project as part of the FAO, along with

producer price data from the FAO, to assess the

empirical performance of Ricardo’s ideas across

17 agricultural crops and 55 major agriculture-

producing countries in 1989. We have asked a

simple question: How do output levels predicted

by Ricardo’s theory compare to those that are

observed in the data? Despite all of the real-

world considerations from which Ricardo’s the-

ory abstracts, we find that a regression of log

output on log predicted output has a (precisely

estimated) slope of 0.21. Ricardo’s theory of

comparative advantage is not just mathemati-

cally correct and non-trivial; it also has sig-

nificant explanatory power in the data, at least

within the scope of our analysis.


Costinot, Arnaud. 2009. “An Elementary Theory of Compar-

ative Advantage.” Econometrica 77(4): 1165-1192.

Costinot, Arnaud, and Dave Donaldson. 2011. “How Large

Are the Gains from Economic Integration? Evidence

from US Agriculture 1880-2000.” Unpublished.

Costinot, Arnaud, Dave Donaldson, and Ivana Komunjer.

2011. “What Goods Do Countries Trade? A Quantita-

tive Exploration of Ricardo’s Ideas.” Review of Economic

Studies forthcoming.

Davis, Donald and David Weinstein. 2001. “An Account of

Global Factor Trade,” American Economic Review, 91(5):


Deardorff, Alan. 1984. “Testing Trade Theories and Predict-

ing Trade Flows.” Handbook of International Economics,

Volume 1, R.W. Kenen and P.B. Jones editors. North Hol-

land, Amsterdam.

Eaton, Jonathan, and Samuel Kortum. 2002. “Technology,

Geography, and Trade.” Econometrica 70(5): 1741-1779.

Heckman, James, and Bo Honore. 1990. “The Empirical

Content of the Roy Model.” Econometrica 58(5): 1121-


IIASA/FAO. 2012. Global Agro-ecological Zones (GAEZ

v3.0). IIASA, Laxenburg, Austria and FAO, Rome, Italy.

Nunn, Nathan and Nancy Qian. 2011. “The Potato’s

Contribution to Population and Urbanization: Evidence

from a Historical Experiment,” Quarterly Journal of Eco-

nomics,126(2), pp. 593-650.