Addresses #1602. Added a method to analysis/erroranalysis that wraps getlabel_buckets functionality. Given a bucket, a NumPy array x of your data, and corresponding y label(s), it will return to you x with only the instances corresponding to that bucket.
1  2 
from typing import Tuple 
2  
3  2 
import numpy as np 
4  2 
import pandas as pd 
5  
6  
7  2 
def filter_unlabeled_dataframe( 
8 
X: pd.DataFrame, y: np.ndarray, L: np.ndarray 

9 
) > Tuple[pd.DataFrame, np.ndarray]: 

10 
"""Filter out examples not covered by any labeling function.


11  
12 
Parameters


13 



14 
X


15 
Data points in a Pandas DataFrame.


16 
y


17 
Matrix of probabilities output by label model's predict_proba method.


18 
L


19 
Matrix of labels emitted by LFs.


20  
21 
Returns


22 



23 
pd.DataFrame


24 
Data points that were labeled by at least one LF in L.


25 
np.ndarray


26 
Probabilities matrix for data points labeled by at least one LF in L.


27 
"""


28  2 
mask = (L != 1).any(axis=1) 
29  2 
return X.iloc[mask], y[mask] 
Read our documentation on viewing source code .