Utility¶

Diagnostic tools to find information about data.

impyute.util.find_null(data)[source]¶

Finds the indices of all missing values.

Parameters:	data: numpy.ndarray Data to impute.
Returns:	List of tuples Indices of all missing values in tuple format; (i, j)

impyute.util.describe(data)[source]¶

Print input/output multiple times

Parameters:

data: numpy.nd.array: The data you want to get a description from
verbose: boolean(optional): Decides whether the description is short or long form

Returns:

dict

missingness: list: Confidence interval of data being MCAR, MAR or MNAR - in that order
null_xy: list of tuples: Indices of all null points
null_n: list: Total number of null values for each column
pmissing_n: float: Percentage of missing values in dataset
null_rows: list: Indices of all rows that are completely null
null_cols: list: Indices of all columns that are completely null
mean_rows: list: Mean value of each row
mean_cols: list: Mean value of each column
std_dev: list: std dev for each row/column
min_max: list: Finds the minimum and maximum for each row

impyute.util.count_missing(data)[source]¶

Calculate the total percentage of missing values and also the percentage in each column.

Parameters:	data: np.array Data to impute.
Returns:	dict Percentage of missing values in total and in each column.

impyute.util.checks(fn)[source]¶

Main check function to ensure input is correctly formatted

Parameters:	data: numpy.ndarray Data to impute.
Returns:	bool True if data is correctly formatted

impyute.util.compare(imputed, classifiers=['sklearn.svm.SVC'], log_path=None)[source]¶

Given an imputed dataset with labels and a list of supervised machine learning model, find accuracy score of all model/imputation pairs.

Parameters:

imputed: [(str, np.ndarray), (str, np.ndarray)…]: List of tuples containing (imputation_name, imputed_data) where imputation_name is a string and imputed_data is a tuple where `imputed_data`[0] is the data, X and `imputed_data`[1] is the label, y
classifiers: [str, str…str] (optional): Provide a list of classifiers to run imputed data sets on. Right now, it ONLY works with sklearn, the format should be like so: sklearn.SUBMODULE.FUNCTION. More generally its ‘MODULE.SUBMODULE.FUNCTION’. If providing a custom classifier, make sure to add the file location to sys.path first and the classifier should also be structured like sklearn (with a fit and predict method).
log_path: str (optional): To write results to a file, provide a relative path

Returns:

impyute.util.preprocess(fn)[source]¶

Base preprocess function for commonly used preprocessing

Parameters:	data: numpy.ndarray Data to impute.
Returns:	bool True if data is correctly formatted

impyute