"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 10, 2020

Model Documentation and Coding Guidelines - Python

This paper was very useful. This covers Data Source, Purpose, Model Accuracy, Recommendations. The key metrics (Screenshot from the paper)


Structuring Machine Learning Projects

ML Experiment Parameters

  • Model Parameters
  • Learning Rate
  • Number of Epochs Run
  • Training Loss
  • Validation Loss
  • CPU %%
  • Memory %%
  • Disk usage




if (readable()): {
be_happy()
}
else: {
refactor()
}
#http://msdl.cs.mcgill.ca/people/shahla/misc/PythonConventions.pdf
#filenames short file names
#myfile.py
#class name, CapWords convention
#class MyClass:
#private and protected variables with _
# _myProtectedVar, _myPrivateVar
#import in seperate lines
#Bad
import sys, os
#Good
import sys
import os
#hierarchy of import
#standard library
#major imports
#App specific imports
#indendation
#break lines with \
#no multiple statements in single line
#Bad
if foo == 'blah': doBlahThing()
#good
if foo == 'blah':
doBlahThing()
#No white space before paranthesis
#bad
spam (1)
dict ['key']
#good
spam(1)
dict['key']
#no white space before comma, colon
#bad
if x==4:
print x , y ,y = y
#good
if x==4:
print x, y, y = y
#operator declaration
#bad
x = 1
operatorA = 2
cab = 3
#good
x = 1
operatorA = 2
cab = 3
#comparisons use None or conditions
#bad
if x:
y = 6
#good
if x is not None:
y = 6
#http://www.cs.rpi.edu/academics/courses/fall18/csci1200/Good_Programming_Practices.pdf
#uppercase constants
GRAVITY
#captitalize first word of class
Person()
#private protected with _ before
_speed
#Variables
#Avoid global variables
#instead of public variable use getters and setters
class Person():
def __init__(self,name):
self.name = name
def getName(self):
return self.name
def setName(self, name):
self.name = str(name)
#Avoid deep nesting
def work_check(word):
if len(word) < 5:
return False
if len(word) % 2 == 0:
return False
if word[0] != 'a':
return False
return False
#Exception Handline
#https://www.datacamp.com/community/tutorials/exception-handling-python
try:
a = 100 / 0
print (a)
except ZeroDivisionError:
print ("Zero Division Exception Raised." )
else:
print ("Success, no error!")
#https://python.g-node.org/python-autumnschool-2010/_media/materials/day0-haenel-best-practices.pdf
#https://gist.github.com/ericmjl/27e50331f24db3e8f957d1fe7bbbe510
#https://github.com/bast/somepackage
#https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
#https://docs.python-guide.org/writing/structure/
#https://towardsdatascience.com/manage-your-data-science-project-structure-in-early-stage-95f91d4d0600
#https://github.com/Azure/Azure-TDSP-ProjectTemplate
#https://drivendata.github.io/cookiecutter-data-science/
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
├── setup.py <- Make this project pip installable with `pip install -e`
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
#https://www.datacamp.com/community/tutorials/inner-classes-python
#https://www.datacamp.com/community/tutorials/python-data-type-conversion
#https://github.blog/2015-01-21-how-to-write-the-perfect-pull-request/
Happy Learning!!!

No comments: