Gold Standard Data Science
The first stage in a data science project is often to collect training data.
Gold standard data science. A gold standard is a monetary system in which the standard economic unit of account is based on a fixed quantity of gold. Both meanings are different because for example in medicine dealing with conditions that would require an autopsy to have a perfect diagnosis the gold standard test would be the best one that keeps the patient alive instead of the autopsy. Okay so i would claim that the gold standard for reproducibility is that others can use your exact environment as it was at the time of the experiment. The same software the same software environment the same hardware the same lighting in the room let s say.
This is the gold standard for crystal structure analysis. The diffraction pattern gives a unique fingerprint of the crystal structure. Stories by matt 0 on medium. A number of other great resources i d read in the past 1 2 3 4 inspired me to create a github repo for a gold standard workflow for setting up a new data science project directory.
Jenny bryan a statistician and data scientist from ubc who has strong views on the proper layout of r scripts workflows and file organization naming said it. To leave a comment for the author please follow the link and comment on their blog. In medicine and statistics a gold standard test is usually the diagnostic test or benchmark that is the best available under reasonable conditions. The gold standard was widely used in the 19th and early part of the 20th century.
The gold standard of data science project management was originally published in towards data science on medium where people are continuing the conversation by highlighting and responding to this story. Is really the appropriate form for reproducible data science. Gold standard is a particular case of external criterion. However getting a good data set is surprisingly tricky and takes longer than one expects.
A statistical or machine learning algorithm wants to predict a criterion which state isn t dependent on the algorithm otherwise criterion is contaminated. Well gold standard is usually a dataset or a set of results which serves as the.