# Utility¶

Diagnostic tools to find information about data.

impyute.util.find_null(data)[source]

Finds the indices of all missing values.

Parameters: data: numpy.ndarray Data to impute. List of tuples Indices of all missing values in tuple format; (i, j)
impyute.util.describe(data)[source]

Print input/output multiple times

Parameters: data: numpy.nd.array The data you want to get a description from verbose: boolean(optional) Decides whether the description is short or long form dict missingness: list Confidence interval of data being MCAR, MAR or MNAR - in that order null_xy: list of tuples Indices of all null points null_n: list Total number of null values for each column pmissing_n: float Percentage of missing values in dataset null_rows: list Indices of all rows that are completely null null_cols: list Indices of all columns that are completely null mean_rows: list Mean value of each row mean_cols: list Mean value of each column std_dev: list std dev for each row/column min_max: list Finds the minimum and maximum for each row
impyute.util.count_missing(data)[source]

Calculate the total percentage of missing values and also the percentage in each column.

Parameters: data: np.array Data to impute. dict Percentage of missing values in total and in each column.
impyute.util.checks(fn)[source]

Main check function to ensure input is correctly formatted

Parameters: data: numpy.ndarray Data to impute. bool True if data is correctly formatted
impyute.util.compare(imputed, classifiers=['sklearn.svm.SVC'], log_path=None)[source]

Given an imputed dataset with labels and a list of supervised machine learning model, find accuracy score of all model/imputation pairs.

Parameters: imputed: [(str, np.ndarray), (str, np.ndarray)…] List of tuples containing (imputation_name, imputed_data) where imputation_name is a string and imputed_data is a tuple where imputed_data[0] is the data, X and imputed_data[1] is the label, y classifiers: [str, str…str] (optional) Provide a list of classifiers to run imputed data sets on. Right now, it ONLY works with sklearn, the format should be like so: sklearn.SUBMODULE.FUNCTION. More generally its ‘MODULE.SUBMODULE.FUNCTION’. If providing a custom classifier, make sure to add the file location to sys.path first and the classifier should also be structured like sklearn (with a fit and predict method). log_path: str (optional) To write results to a file, provide a relative path results.txt Classification results on imputed data
exception impyute.util.BadInputError(value)[source]
impyute.util.preprocess(fn)[source]

Base preprocess function for commonly used preprocessing

Parameters: data: numpy.ndarray Data to impute. bool True if data is correctly formatted