1. Page 1
U.S. Food and Drug Administration
Office of the Commissioner/Economics Staff
Internship Reflection – BSOS386
Upon the completion of my previous internship, I was excited to acquire greater insight
into the microeconomic analysis conducted at the federal level. Within the FDA Commissioner's
office lies a group of about 35 regulatory economists, who I had the pleasure of working with
throughout the first half of my senior year. This is the largest staff I have been a part of to date,
which proved both beneficial and challenging.
Excited to take in as much as I could in the little time I had, I hit the ground running. The
Economics Staff had recently tapped into the vast network of ScanTrack data provided through
the agency's subscription to AC Nielsen. Nielsen's ScanTrack data contains records of
grocery/convenience store goods purchased from 2002-present, through UPCs. The data was
broken down into hundreds of categorical variables with multi-thousand line spreadsheets for
each major department and subcategory. Having worked with similar size data at my other two
internships, I was prepared to use many of the same navigational and analytical tactics to help
me transition from corporate taxes into something more tangible like bakery products.
The FDA had recently initiated a 'Sodium Reduction Initiative' to combat the rising
consumption of processed foods; which are notorious for their high preservative and sodium
content. Excessive sodium intake can pose severe long-term health risks, with the potential to
develop more adverse spillover effects. The agency set an initial target level for daily
consumption that has since been revisited for stricter regulation policy proposals. The Economics
Staff was tasked with quantifying a new target, and sought to investigate previous and current
consumer trends in Nielsen’s ScanTrack data.
ScanTrack data is provided periodically in ‘waves’ to provide subscribers with the most
up to date sales numbers. Two senior economists had recently begun the initial data sourcing and
analysis, running the most up to date Wave 5 data through SAS. My first project was to audit
2. Page 2
their reconfiguration and identify the discrepancies between the inputs and outputs. In the ‘Raw’
ScanTrack data, the would-be production description column included the name and a respective
size and measuring unit; a possible root cause for post-SAS misalignment. My first instinct was
to separate the two through Stata, before realizing that there had to be a simpler method to create
the two columns in Excel. Sure enough, with a little troubleshooting I was able to parse the two
through the ‘Convert Text to Columns’ tab. Within the conversion window, I set the parsed
column destinations using a fixed width break line to designate each to a new ‘Product
Description’ and ‘Size’ category. This would be useful to apply to the hundreds of ScanTrack
spreadsheets, but for the nature of this project I only reformatted ‘Wave 5 – Part 1’; passing on
my suggestion to my associates.
One of the most noticeable issues was that the SAS transformation eliminated the product
descriptions, making it very difficult to cross-reference the various reference variables between
input and output spreadsheets. My best bet was to align the spreadsheets together by UPCs, a
reference variable that would identify an exact good. After adding a few filters to the primary
columns, I was successfully able to line up the pre-SAS and post-SAS spreadsheets. Later, I
filtered Wave 5 even further by adding two more tiers for month and year; transcribing the
months into their respective number values. Now that this discrepancy was sorted out (no pun
intended), I could revisit the key ‘Average Unit Price’ and ‘Sales Units’ variables. Using Excel’s
IF function, I created a mass applicator output column to identify which goods had discrepancies
after the SAS transformation. The column would return ‘MATCH’ or ‘NO MATCH’ coded in
green and red respectively. I used the same thorough approach to cross-reference the ~30 vitamin
and nutrient variables associated with these goods; using a series of filters on stacked datasets to
expedite my assignment. I was later asked to condense the data even further in a separate
spreadsheet, eliminating the duplicate products. Referencing a previously utilized Do-File from
my time at the NJ Treasury, I rewrote the Stata code for eliminating duplicates for this new
spreadsheet. The statistical significance of my associates results relied primarily on the quality of
data used; a mathematical pitfall I was intent to fill. While I was unable to rewrite the SAS code
to fix these issues, I was very happy to contribute my time to my associates’ new project and
working with Nielsen’s highly explorable datasets.
The first couple months of my internship were spent getting to know the layout of
Nielsen’s datasets and familiarizing myself with their project plan; and so, my thoroughness
aimed to build upon my Excel skills while achieving the most accurate results for my colleagues.
I also worked simultaneously on a couple of other projects to learn more about the research being
conducted in other industries. One economist was working on a regulation draft for sunscreen
and cosmetics which had a few missing components to it. The data my colleague was using was
missing brand and manufacturing/distributor names for ~4,000/11,000 products. This seemed
daunting at first glance, but thankfully it wasn’t the first time I had compiled relatively
nonexistent data. Using DailyMed, I could simply plug in the FDA provided NDC codes
associated with each product to return the requested brand name. This went smoothly at first
until I came across some foreign makeup brands that weren’t registered in DailyMed and had
drug labels that were in another language. These products were frustrating, but I found that these
companies had other goods in this spreadsheet with alternate English names on them. Now
3. Page 3
comprehendible, I could search for these companies for their legal names and parent companies
where applicable. Using DailyMed for this project was a great way to get acquainted with the
cosmetic industry’s biggest manufacturers and the various regulatory speedbumps they
encounter.
Shortly after the completion of my cosmetics project, I sought out the lone economist on
staff that works with pharmaceuticals, medical devices, and biotechnology related policies (as
appetizing as food safety was). She had been meaning to start a research study to see how long it
takes to develop a drug, with regards to molecular and target development; and how these
processes have changed over time under various circumstances. I had never had the opportunity
to lay the groundwork for a research project, and was determined to compile as much as I could.
With my FDA email, I could access a database called Citeline, a.k.a. Pharmaprojects, which
compiled clinical drug development lifecycle events and statistics. Without a chemistry/biology
background it was difficult to understand what many of these drug components were, but from
an economic perspective these commercial codes and targets were just like any other variables. I
have generally found it easier to work with tangible data, but have anxiously sought an
opportunity to work with new data from an entirely foreign industry such as pharmaceuticals.
My associate had provided me with an initial spreadsheet containing ~70 targeted drugs’ oldest
active ingredient, initial public introduction, and manufacturer; delegating me to find their
trade/commercial name, drug code, and targets.
After exploring the database’s interface, I could properly identify the necessary
components asked of me. It took me a few entries to figure out that Pharmaprojects could hold
more than one drug input per search entry, which eliminated the unnecessary repetitive step of
searching one at a time. The individual drug development timeline pages wouldn’t let me copy
and paste the components I needed, which was especially inconvenient when it came to
irregularly complex targets such as Phosphoribosylglycinamide Formyltransferase. Lastly, I
cross-referenced the drug codes in PubMed to see when they were initially introduced to the
public, pointing out any inconsistencies between them and FDA records. While compiling these
metrics, I also completed some basic research on the firms that produced these drugs. Many of
the big drug producers like Bristol-Myers Squibb, GlaxoSmithKline, Merck, Eli Lilly, etc., had
previously acquired other firms, and so I noted these M&As where applicable. The
Pharmaprojects database contained other very interesting statistics and facts that I’m not allowed
to disclose, but were appealing to read about. My associate was very pleased with my work to
get this going, after previous postponements, but was unfortunately unable to revisit the project’s
next steps before my departure. I look forward to following her ongoing research.
In the final weeks of my internship, I began working on a larger project with another
senior economist who I had completed smaller tasks for earlier in the semester. This economist
had recently begun working with recalled products through a series of Recall Enterprise System
(RES) and Reportable Food Registry (RFR) datasets to estimate the risk associated with
distributing non-compliant goods and the potential deadweight loss imposed on the food market
because of them. We were most interested in FY2013, as it had the most abundant and diverse
set of goods. The RES/RFR datasets gathered some basic product characteristics such as its
4. Page 4
manufacturer, recalled date, primary/secondary agents, etc., but lacked respective prices.
Immediately, I suggested using the raw Nielsen spreadsheets to seek out a price through their
data, thinking that there had to be one or more of these products recorded. Unfortunately, the raw
Nielsen data lists truncated product names, making finding the exact product very difficult. We
later decided that it would make the most sense to find a respective match to the RES/RFR
products, as it would return similar results.
I added another tab to our initial spreadsheet to line up the commodities and their
reference categories, with an additional column labeled ‘Nielsen Match’ for the truncated
Nielsen products. In case I had to return to one of the hundred thousand Nielsen spreadsheets, I
added another few columns for UPC, department, and category; before later adding the key
columns: Total Sales Revenue (2013), Total Equivalent Units, Price per 16 Ounces, Serving
Size, Grams per (Price Per 16 Ounces), and the weekly totals for revenues and quantities.
Searching through the Nielsen departments was very tedious, but by using a filter on several key
variables such as ‘Flavor’, I could find them relatively quickly. I set up a SUM function to tally
the aggregate totals across annual revenue and units sold for each good; this way all I had to do
was copy and paste over from Nielsen, and Excel would keep a running total for the match as
well as the entire recall spreadsheet for both variables. I used the same methodology for the
column ‘Price per 16 Ounces’, the equivalized based used by Nielsen to measure goods that
normally would be incomparable (e.g. fajita mix vs. assorted cookies). Using the ‘/’ division
symbol and mass applicator, I easily divided ‘Total Sales Revenue (2013)’ by ‘Total Equivalent
Units’ to arrive at the ‘Price per 16 Ounces’; giving us a rough estimate of what that particular
good would cost given the current market value, approximated by the sales volume referenced
through Nielsen. Next I compiled the serving sizes for each using the FDA’s Serving Size Final
Rule, while also converting the respective total volume serving sizes to grams. The last
component I would add to my associate’s report was the reference amount customarily
consumed, an externally determined consumption reference amount enforced through the FDA’s
Serving Size Final Rule. Now that everything was in place, I could crunch the totals and
determine one of my associate’s report’s key metrics: Mean Retail Sales Dollars per Servings.
I have provided a sample row with slightly altered figures to depict my findings for my
recall project.
The ‘Total Equivalent Units’ are the total units sold of this particular good in FY2013;
Retail Sales Dollars ($) is the total revenue accrued from annual sales of the given good;
Equivalized Base (oz.) is the 16-ounce reference metric used to control for the variation in food
measurements; Total Number (#) of Retail Ounces (oz.) is the sum of the Total Equivalent Units
(16 oz.) times the Equivalized Base (oz.); The Total Number (#) of Retail Grams (g) is the Total
Number (#) of Retail Ounces (oz.) multiplied by 28.3495, the conversion rate from ounces to
grams; Total Goods in Servings is the Total Number (#) of Retail Ounces (oz.) times the 28.3495
Total Equivalent
Units (16 oz.)
Retail Sales Dollars ($)
Equivalized
Base (oz.)
Total Number
(#) of Retail
Ounces (oz.)
Total Number (#) of
Retail Grams (g)
Total Goods in
Servings
(servings)
Mean Retail
Sales Dollars ($)
per Serving
(servings)
4,956,992,908 9,162,671,802.18$ 16 79,311,886,528 2,248,452,327,126 44,969,046,543 $0.20
5. Page 5
conversion rate divided by the reference amount customarily consumed (in this case 50); and
finally the Mean Retail Sales Dollars ($) per Serving (Servings) is the Retail Sales Dollars ($)
divided by the Total Goods in Servings (servings). In this example, the cost, defined by the lost
revenue incurred by the manufacturer, can be approximated as $0.20 per serving. These numbers
again are altered for my internship review, and do not depict statistically significant findings or
represent approved/finalized FDA figures.
This example was sourced from a similar evaluation that measured an entire category of
food (e.g. bread) and depicts a much larger result, representative of the countless firms that
produce bread. My project conversely analyzed a group of select recalled products that spanned
across multiple food categories. The numbers I discovered would appropriately be much less per
serving than the one described above. This is one of many statistics that will be used by my
associate in his final public health risk estimation once he compiles the remaining ones and runs
it through @RISK. This project was very data intensive, providing me with another opportunity
to strengthen my Excel skills and gain some more experience running some of the basic Stata
syntaxes to help clean my datasets. While I was working on my recall and pharmaceutical
projects, I also began to teach myself SAS programming through one of the FDA’s many
software subscriptions. I usually utilized these training modules in my spare time and in the
occasional break in the action at work to strengthen my statistical toolkit and gain a better
understanding of computer programming. The syntaxes and interface are a little different than
Stata and SPSS but certainly nothing overwhelming. I was very fortunate to have access to a
high demand skill program, not taught at UMD, that could help me familiarize myself with new
analytical methods to supplement my theoretical coursework. I am anxious to continue teaching
myself SAS over the next few years to help improve my software versatility.
Overall, I am happy to reflect on this experience as a great microeconomic opportunity to
gain specialized insight into health economics and government regulation. This time there was
no other intern to corroborate with, and so my ability to communicate across a relatively large
and sporadic department was certainly tested. Despite said conditions, everyone on staff was
very helpful and accommodating during my tenure, allowing me to get as much out of these
short few months as I could. While I will miss my fellow economists at the FDA dearly, I am
excited to return to Washington, D.C. in my final semester. I will conclude my undergraduate
internship experience at the executive agency level, working alongside macroeconomists in the
U.S. Department of the Treasury’s Office of International Affairs.