Utility¶
Diagnostic tools to find information about data.
-
impyute.util.
find_null
(data)[source]¶ Finds the indices of all missing values.
Parameters: - data: numpy.ndarray
Data to impute.
Returns: - List of tuples
Indices of all missing values in tuple format; (i, j)
-
impyute.util.
describe
(data)[source]¶ Print input/output multiple times
Parameters: - data: numpy.nd.array
The data you want to get a description from
- verbose: boolean(optional)
Decides whether the description is short or long form
Returns: - dict
- missingness: list
Confidence interval of data being MCAR, MAR or MNAR - in that order
- null_xy: list of tuples
Indices of all null points
- null_n: list
Total number of null values for each column
- pmissing_n: float
Percentage of missing values in dataset
- null_rows: list
Indices of all rows that are completely null
- null_cols: list
Indices of all columns that are completely null
- mean_rows: list
Mean value of each row
- mean_cols: list
Mean value of each column
- std_dev: list
std dev for each row/column
- min_max: list
Finds the minimum and maximum for each row
-
impyute.util.
count_missing
(data)[source]¶ Calculate the total percentage of missing values and also the percentage in each column.
Parameters: - data: np.array
Data to impute.
Returns: - dict
Percentage of missing values in total and in each column.
-
impyute.util.
checks
(fn)[source]¶ Main check function to ensure input is correctly formatted
Parameters: - data: numpy.ndarray
Data to impute.
Returns: - bool
True if data is correctly formatted
-
impyute.util.
compare
(imputed, classifiers=['sklearn.svm.SVC'], log_path=None)[source]¶ Given an imputed dataset with labels and a list of supervised machine learning model, find accuracy score of all model/imputation pairs.
Parameters: - imputed: [(str, np.ndarray), (str, np.ndarray)…]
List of tuples containing (imputation_name, imputed_data) where imputation_name is a string and imputed_data is a tuple where `imputed_data`[0] is the data, X and `imputed_data`[1] is the label, y
- classifiers: [str, str…str] (optional)
Provide a list of classifiers to run imputed data sets on. Right now, it ONLY works with sklearn, the format should be like so: sklearn.SUBMODULE.FUNCTION. More generally its ‘MODULE.SUBMODULE.FUNCTION’. If providing a custom classifier, make sure to add the file location to sys.path first and the classifier should also be structured like sklearn (with a fit and predict method).
- log_path: str (optional)
To write results to a file, provide a relative path
Returns: - results.txt
Classification results on imputed data