Updates(2020.08.16)
This update is mainly about miscellaneous fixes, but I've also introduced a toy example to reveal the power of carefree-learn
- the famous Titanic competition!
Here are the source codes:
import os
import cflearn
from cfdata.tabular import *
file_folder = os.path.dirname(__file__)
def test():
train_file = os.path.join(file_folder, "train.csv")
test_file = os.path.join(file_folder, "test.csv")
data_config = {"label_name": "Survived"}
hpo = cflearn.tune_with(
train_file,
model="tree_dnn",
temp_folder="__hpo__",
task_type=TaskTypes.CLASSIFICATION,
data_config=data_config,
num_parallel=0
)
results = cflearn.repeat_with(
train_file,
**hpo.best_param,
models="tree_dnn",
temp_folder="__repeat__",
num_repeat=10, num_jobs=0,
data_config=data_config
)
ensemble = cflearn.EnsemblePattern(results.patterns["tree_dnn"])
predictions = ensemble.predict(test_file).ravel()
x_te, _ = results.transformer.data.read_file(test_file, contains_labels=False)
id_list = DataTuple.with_transpose(x_te, None).xT[0]
# Score : achieved ~0.79
with open("submissions.csv", "w") as f:
f.write("PassengerId,Survived\n")
for test_id, prediction in zip(id_list, predictions):
f.write(f"{test_id},{prediction}\n")
if __name__ == '__main__':
test()
As you can see, carefree-learn
doesn't need explicit data-preprocessing - it can take files as inputs and predict with files directly! More over, some common practises, such as hyper parameter tuning (cflearn.tune_with
) and ensembling (cflearn.repeat_with
& cflearn.EnsemblePattern
), can be completed in a few lines of codes. These APIs also hide some other common practises (such as cross validation) under the hood, so the final performance is quite promising (I can achieve ~0.79 and the best one achieved 0.81+, which is almost the SOTA performance among other (more complicated) neural network solutions 1 2 3 4).
Log in or sign up for Devpost to join the conversation.