As talked in Gitter, while developing union
I found out a problem where the application enters a deadlock while resolving the partitioning or computation of a dag. The workign branch is: https://github.com/iduartgomez/native_spark/tree/dev
The error is reproducible executing:
#[test]
fn test_error() {
let sc = CONTEXT.clone();
let join = || {
let col1 = vec![
(1, ("A".to_string(), "B".to_string())),
(2, ("C".to_string(), "D".to_string())),
(3, ("E".to_string(), "F".to_string())),
(4, ("G".to_string(), "H".to_string())),
];
let col1 = sc.parallelize(col1, 4);
let col2 = vec![
(1, "A1".to_string()),
(1, "A2".to_string()),
(2, "B1".to_string()),
(2, "B2".to_string()),
(3, "C1".to_string()),
(3, "C2".to_string()),
];
let col2 = sc.parallelize(col2, 4);
col2.join(col1.clone(), 4)
};
let join1 = join();
let join2 = join();
let res = join1.union(join2).unwrap().collect().unwrap();
assert_eq!(res.len(), 12);
}
Inside some executor there is a thread panic over here:
let mut stream_r = std::io::BufReader::new(&mut stream);
let message_reader = serialize_packed::read_message(&mut stream_r, r).unwrap()
bug