Collecting Tweets Using R and the Twitter Search API

Sentiment Analysis and Natural Language Processing (NLP) have always fascinated me yet I never really understood the inner-workings of this type of analysis and never made the time to dig into the science. Until recently, I didn’t even know that you could collect tweets for free using Twitter’s Search and Streaming APIs. A few days and several blogs later, I’ve now set up R to work with both the Search and Streaming APIs. Since much of the information was located on disparate websites, I thought I’d give a general recap here. This first post deals with using the Twitter Search API and R to collect tweets. Before I dig into the code, there are some notes I want to touch on (which I later learned from Twitter’s documentation). Continue reading…

Out of Scope, Out of Money

The U.S. Government Accountability Office (GAO) recently released a document detailing an investigation on the difficulties and costs associated with launching, the federal health insurance marketplace implemented as a result of the Patient Protection and Affordable Care Act (PPACA), informally known as Obamacare. The report, titled HEALTHCARE.GOV: Ineffective Planning and Oversight Practices Underscore the Need for Improved Contract Management provides a timeline of events that led to more than $200 mil (or almost 27%) in cost overruns, with the Government paying approximately $946 mil from fiscal year 2010 through March 2014. Continue reading…

The Most Comprehensive EHR Dataset is Now Available

After working with 5 different datasets published by the Centers for Medicare and Medicaid Services (CMS), the most comprehensive dataset on EHR use, incentive payments, provider demographics, EHR vendor information and more is now available for download on my website. This new dataset contains the following information: Continue reading…


Visualizing CoveredCA Enrollments

CoveredCA previously released enrollment numbers in the Affordable Care Act (ACA) health insurance exchanges here. According to their reports, almost 1.4 million (1,395,929) Californians enrolled in insurance plans sold on the exchange through April 15th, 2014, with Los Angeles, Orange County and San Diego County taking the top 3 spots in total number of enrollments (see table below). CoveredCA beat their projections of enrolling at least 400,000 – 700,000 individuals (according to a Sept 2013 document) by triple digit percents, already reaching levels forecast for 2015-2016.

CountyTotal Enrolled% of Total Enrolled% of County PopulationCounty Population Estimate
Los Angeles400,88928.7% 4.0%10,017,068
Orange131,804 9.1%4.2%3,114,363
San Diego121,900 8.7%3.8%3,211,252
Riverside69,350 5.0%3.0%2,292,507
Alameda65,171 4.7%4.1%1,578,891
Santa Clara64,924 4.7%3.5%1,862,041
San Bernardino53,623 3.8%2.6%2,088,371
Sacramento43,796 3.1%3.0%1,462,131
San Francisco40,826 2.9%4.9%837,442
Contra Costa39,349 2.8%3.6%1,094,205

Continue reading…