Skip to main content

Table 2 Linkage metrics TP (true positives), FP (false positives), FN (false negatives), TN (true negatives), and f-score for each classifier per pair of data sets

From: Application of data linkage techniques to Pacific Northwest commercial fishing injury and fatality data

Commercial Fishing Incident Database & Oregon Trauma Registry

Match Parameters: Incident Date, Incident State

Combinations (2966 * 11): 32,626

Golden Matches: 5

Classifier

Threshold

TP

FP

FN

TN

f-score

Expectation/Conditional Maximization

0.5

4

6

1

32,615

0.53

Support vector machine

0.5

0

0

5

32,621

0

Naïve-Bayes

0.005

5

29

0

32,592

0.26

Logistic regression

0.005

5

29

0

32,592

0.26

Commercial Fishing Incident Database & Vessel Casualty

Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude

Combinations (1315 * 524): 689,060

Golden Matches: 9

Classifier

Threshold

TP

FP

FN

TN

f-score

Expectation/Conditional Maximization

0.5

9

3

0

689,048

0.86

Support vector machine

0.5

8

0

1

689,051

0.94

Naïve-Bayes

0.005

9

3

0

689,048

0.86

Logistic regression

0.005

9

7

0

689,044

0.72

Commercial Fishing Incident Database & Nonfatal Injuries

Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude

Combinations (2966 * 232): 688,112

Golden Matches: 12

Classifier

Threshold

TP

FP

FN

TN

f-score

Expectation/Conditional Maximization

0.5

12

52

0

688,048

0.32

Support vector machine

0.5

0

0

12

688,100

0

Naïve-Bayes

0.005

12

52

0

688,048

0.32

Logistic regression

0.005

12

52

0

688,048

0.32

Nonfatal Injuries & Vessel Casualty

Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude

Combinations (232 * 524): 121,568

Golden Matches: 10

Classifier

Threshold

TP

FP

FN

TN

f-score

Expectation/Conditional Maximization

0.5

10

13

0

121,545

0.61

Support vector machine

0.5

9

1

1

121,557

0.90

Naïve-Bayes

0.01

10

2

0

121,556

0.91

Logistic regression

0.01

10

13

0

121,545

0.61

Nonfatal Injuries & Oregon Trauma Registry

Match Parameters: Incident Date, Incident State

Combinations (232 * 11): 2552

Golden Matches: 4

Classifier

Threshold

TP

FP

FN

TN

f-score

Expectation/Conditional Maximization

0.2

4

3

0

2545

0.73

Support vector machine

0.5

0

0

4

2548

0

Naïve-Bayes

0.005

4

3

0

2545

0.73

Logistic regression

0.005

4

7

0

2541

0.53

Vessel Casualty & Oregon Trauma Registry

Match Parameters: Incident Date, Incident State

Combinations (524 * 11): 5764

Golden Matches: 0