You have probably heard that Google has released a set of mobility reports recently. The site hosting these reports, the so-called COVID-19 Community Mobility Reports, begins with the following sentence: “See how your community is moving differently due to COVID19”.
What is it about?
Google offers a Location History feature in its services/systems that monitors the location, and consequently the displacement, of users. This data can be accessed and disabled at any time by users. According to Google, this feature needs to be activated voluntarily, as it is disabled by default. Based on this information, they observed how and where these individuals used to go in a period prior to the COVID-19 outbreak and how and where they are moving now, during the outbreak. There is a clear bias here. People who do not have a cell phone or tablet, or who have not activated this feature, are out of their sampling and this can impact the conclusions of the report. Still, it’s worth a look.
A simple project tutorial with R/RMarkdown, Packrat, Git, and DVC.
The pain of managing a Data Science project
Something has been bothering me for a while: Reproducibility and data tracking in data science projects. I have read about some technologies but had never really tried any of them out until recently when I couldn’t stand this feeling of losing track of my analyses anymore. At some point, I decided to give DVC a try after some friends, mostly Flávio Clésio, suggested it to me. In this post, I will talk about Git, DVC, R, RMarkdown and Packrat, everything I think you may need to manage your Data Science project, but the focus is definitely on DVC.
Depending on your background, you have already heard of spurious dependence in a way or another. It goes by the names of spurious association, spurious dependence, the famous quote “correlation does not imply causation” and also other versions based on the same idea that you can not say that X necessarily causes Y (or vice versa) solely because X and Y are associated, that is, because they tend to occur together. Even if one of the events always happens before the other, let’s say X preceding Y, still, you can not say that X causes Y. There is a statistical test very famous in economics known as Granger causality.
The post hoc ergo propter hoc fallacy is also known as “after this, therefore because of this”. It’s pretty clear today that Granger causality is not an adequate tool to infer causal relationships and this is one of the reasons that when X and Y are tested by the granger causality test, and an association is found, it’s said that XGranger-causesY instead of saying that X causes Y. Maybe it’s not clear to you why the association between two variables and the notion that one always precedes the other is not enough to say that one is causing the other. One explanation for a hypothetical situation, for example, would be a third lurking variableC, also known as a confounder, that causes both events, a phenomenon known as confounding. By ignoring the existence of C (which in some contexts happens by design and is a strong assumption called unconfoundedness), you fail to realize that the events X and Y are actually independent when taking into consideration this third variable C, the confounder. Since you ignored it, they seem dependent, associated. A very famous and straight forward example is the positive correlation between (a) ice cream sales and death by drowning or (b) ice cream sales and homicide rate.
The bad side of leaving annoying bugs to be fixed later is that at some point you forget important intel about them. For some reason, my Ctrl+Shift+Arrow shortcut wasn’t working. Sometimes, while using gedit or LibreOffice Writer, I would try to select a few words with Ctrl+Shift+Arrow and it would fail. The cursor wouldn’t move 😡
I use RStudio with vim keybindings so if I want to select a few words I would use b or w in visual mode but it would also happen that I would try to use Ctrl+Shift+Arrow in edit mode and the cursor would be there, stuck 🙄 I edited so much the configuration of my Openbox that I thought it could have been me. I backed up my configuration file, brought the “factory configuration”, closed everything, restarted it and it was fixed! After a while, I noticed it wasn’t. At some point, I remembered the source of the issue: Opera! Whenever Opera is running, some shortcuts stop working.
After some search, I found this website explaining why this happens and suggesting a dirty workaround. Honestly, this is ridiculous. And to think that some people accept to kill twice a built-in extension every time they run Opera so that system-wide (or other applications) shortcuts can work as they should… Come on, Opera! Fortunately, I found a permanent solution, much better than the recurrent double homicide of the built-in extension. In GNU/Linux, or at least in Ubuntu, the Preferences file can be found in $HOME/.config/opera/Preferences. In my case, it’s /home/mribeirodantas/.config/opera/Preferences. I used an online JSON Editor to load my Preferences file from my machine and remove the command sub-tree inside extensions as the solution suggests. You will see a small box on the left of each row. Click on the box with the left button of your mouse and go to remove it, just like in the image below.
After that, click on save to save this file to your machine. Now comes the part that the solution did not mention! When you close your Opera browser, it rewrites the Preferences file. So if you replace the original Preferences file with the new one, with the commands row removed, when you close Opera it will change it back. So save the new file in a different place, close Opera and only then you replace the Preferences file with your new file. The JSON Editor I used adds .json to the end of the filename so you have to rename it to “Preferences” (not Preferences.json) and replace the file in the .config/opera folder. Then, open your Opera browser and it should be fixed 😉
I really like Opera. Otherwise, I think I would have abandoned it after a few attempts to fix this bug. For now, I will stick to it 🙂