Discrimination of Marihuana Using Cluster Analysis
Syuji OKUYAMA and Toshiyuki MITSUI
Return
Introduction
Recently in Japan, marihuana, cocaine and opium are increasingly used as
drugs instead of methamphetamine. In this paper, for presumption of
the purchasing pathway to marihuana, the discrimination among marihuana was
investigated using cluster analysis [1,2]
and personal computer programing
for the discrimination of some samples. We have already shown that cluster
analysis is a powerful technique for the qualitative analysis of materials
such as resins [3], medicines [4] and
fibers [5]. The advantage of this
method is that there is no bias from the interpreter. The calculation is
completed using a personal computer without involving the prejudice of the
analyst. Using this method, the characterization of each sample could be
approximately estimated.
The discrimination was completed using 43 marihuana samples that were
seized at the Aichi prefecture in Japan in November 1994. These samples
were measured using gas chromatography (GC) and gas chromatograph
mass-spectrometry (GC-MS). The obtained data were corrected as a fixed rule.
Cluster analysis was performed using the corrected values. Discrimination
among marihuana can now be investigated with taking into account the results
of this cluster analysis.
Experimental
At first, 5 ml of n-hexane was added to 50 mg of marihuana and it was
permitted to stand for ten minutes. The obtained solution was concentrated
to 1 ml using a water bath at 90 oC, and 1 ml of concentrated solution was
measured using GC and GC-MS. The GC peaks of cannabidiol (CBD),
tetrahydrocannabinol (THC) and cannabinol (CBN) and some fragment
ions of tetrahydrocannabivarin (THCV), CBD and THC using GC-MS were selected
using quantification IV[2]. Using the selected peaks and
fragment ions, the
discrimination among the marihuana was performed using the cluster analysis.
The quantification IV and cluster analysis were calculated using a
personal computer (NEC PC-9801 RX) which was programed in Basic.
Investigation of the analytical conditions of GC and GC-MS
The optimum conditions for GC and GC-MS were selected according to
reproducibility and operation times. The best analytical conditions of
GC and GC-MS for the discrimination of marihuana are shown in
Tables 1 and 2, respectively.
Table 1 Operating conditions of GC/MS
------------------------------------------------
Instrument JEOL JMS-DX300
MS-GCG05
Column 1.5% Silicone OV-17
(2.5mm i.d. x 1m)
Column temp. 230C
Injection temp. 250C
Separator temp. 250C
Inlet temp. 250C
Chamber temp. 200C
Ionization volt 70V
Ionization current 300.micro.A
Carrier gas He
------------------------------------------------
Table 2 Operating conditions of GC
--------------------------------------------------
Insturument HEWLETT PACKARD
5890 SERIES II
Column DB1 (0.53mm i.d. x 15m)
Column temp. 230C
Injection temp. 250C
Detecter temp. 250C
Carrier gas He
-------------------------------------------------
The effect of the reproducibility against the measurement result of the same
sample for GC and GC-MS
The discrimination among marihuana is very difficult if each content
of THCV, CBD, THC and CBN in marihuana varies with elapsed time and the
degree of variation is random to elapsed time. Accordingly, each content of
THCV, CBD, THC and CBN to different elapsed time in marihuana was
investigated using the peak areas of GC and GC-MS. It was found that the
measurement results were unchanged over three months.
Preparation of filed data
The peak areas of GC and GC-MS that were effective for cluster analysis
from within all components in marihuana were selected using the quantification
IV. The quantification IV is the method used for selecting data that would be
useful in the cluster analysis and the specific peak areas of several components
are selected for the discrimination of marihuana. This time, the three peaks
for GC and eight mass fragment ions in the three components for GC-MS were
selected and these peak areas were read out. The selected three peaks from the
GC were CBD, THC and CBN, and the fragment ions from the GC-MS were 271, 243
and 231 in THCV, 231 in CBD and 314, 299, 271, and 231 in THC. The
chromatograms of GC and GC-MS are shown in Figures 1 and
2. Even if the areas
of the other GC peaks and fragment ions were larger than
Fig.1 Chromatogram of GC
CBD :Cannabidiol
THC :Tetrahydrocannabinol
CBN :Cannabinol
Fig.2 Total ion chromatogram of GC-MS
THCV:Tetrahydrocannabivarin
CBD :Cannabidiol
THC :Tetrahydrocannabinol
those of the selected peaks and fragment ions, they were hardly differentiated
among marihuana and were not useful in the cluster analysis. In these
selected peaks and fragment ions, THC for GC and 314 in THC for GC-MS
were used as the internal standards. For GC, the areas of CBD and CBN
were divided by the area of THC. For GC-MS, the area of selected fragment
ions was divided by the area of 314 in THC. With combining the selected
peaks and fragment ions of GC and GC-MS, respectively, 9 values were
obtained from each sample. According to Table 3,
the values were then
normalized to eleven blocks from 0 to 10. Because the intact values divided
by the internal standard included experimental errors, this normalization method
Table 3 Correction of the data
------------------------------------------------
Divided value by Normalized
internal standard(%) number
------------------------------------------------
0 0
0 - 1 1
1 - 2.5 2
2.5- 5 3
5 - 7.5 4
7.5- 10 5
10 - 25 6
25 - 50 7
50 - 75 8
75 -100 9
100 - 10
------------------------------------------------
gave smaller errors than those detected when the values divided by the internal
standard were used. An example of these calculations is shown in
Table 4.
The corrected values are shown in Table 5. The cluster analysis was
performed using this matrix.
Table 4 Normalization method of peak area
----------------------------------------------
Peak Peak Divided Normalized
number area value by I.S. value by Table3
----------------------------------------------
I.S. 494.8 --- ---
1 517.4 1.046 10
2 0 0 0
3 243.2 0.492 7
4 0 0 0
5 0 0 0
6 379.2 0.766 9
7 21.4 0.043 3
I.S. 12438 --- ---
8 485 0.039 3
9 63 0.005 1
----------------------------------------------
Table 5 Filed data for cluster analysis
--------------------------------------------------------------------
Sample number Peak number Sample number Peak number
--------------------------------------------------------------------
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
1 10 0 7 3 0 9 3 6 2 23 10 2 8 2 3 9 5 6 1
2 10 0 7 0 0 9 3 3 1 24 9 3 7 3 0 9 6 5 3
3 10 2 8 2 0 9 3 2 1 25 10 3 8 3 0 9 4 3 6
4 10 2 8 2 3 9 6 6 8 26 10 2 8 3 0 9 2 5 6
5 10 0 8 0 2 9 2 2 6 27 9 1 8 2 0 10 2 6 2
6 9 0 8 0 0 9 3 3 3 28 9 1 8 1 4 9 6 4 6
7 10 0 8 0 6 8 10 10 10 29 9 2 8 3 2 10 7 7 4
8 10 5 8 4 0 9 4 3 2 30 10 2 8 2 2 9 6 5 6
9 10 0 8 0 2 9 5 2 8 31 10 1 8 2 3 9 7 6 6
10 10 0 8 4 7 9 8 10 10 32 10 2 8 2 6 9 7 6 7
11 9 3 7 3 2 9 4 5 6 33 10 6 8 6 0 9 6 3 6
12 10 0 8 0 0 9 3 3 1 34 10 6 8 6 0 9 6 4 6
13 10 0 8 0 0 9 4 3 1 35 10 5 8 4 0 9 5 3 4
14 9 0 8 0 0 9 2 3 1 36 10 3 8 3 2 9 4 6 4
15 10 2 8 2 2 9 7 6 1 37 10 2 8 2 2 7 3 6 1
16 10 2 8 2 3 9 7 6 1 38 10 0 8 0 0 9 2 2 6
17 10 2 8 2 3 9 8 6 1 39 10 0 8 0 0 9 2 8 6
18 10 5 8 5 0 9 3 3 6 40 10 4 8 4 0 9 4 5 3
19 10 3 8 3 0 9 3 3 5 41 10 0 8 0 0 9 2 3 6
20 10 3 8 3 0 9 3 3 6 42 9 2 8 1 0 10 0 6 6
21 10 6 8 6 0 9 3 3 6 43 9 2 8 1 3 9 5 6 3
22 10 4 8 3 0 9 3 3 4
--------------------------------------------------------------------
Data input
The program of this method consists of two parts, that is, data input and
calculation. The procedure is as follows.
First, "S100" as file name is read out. The number of data (selected peaks
and fragment ions), discriminated samples, and sample name are questioned.
The data are then fed every one sample. If an input error occurs, it is able
to be corrected after all data are fed. "1" is fed in the case of no error and
"2" is the case of correcting the data. The rest is automatically calculated.
Results and discussion
Cluster analysis
The cluster analysis [1,2] was
completed using the matrix shown in
Table 5. As a result, the minimum Euclidean distance (MIED) between
two samples was obtained. The Euclidean distance was calculated as follows.
D; Euclidean distance, Xi; coordinates of one sample,
yi; coordinates of the other sample.
The MIED indicates the similarity of samples. We judged whether the samples
were the same or not using the MIED. The judgment was determined from an experiment
consisting of more than five hundred samples. If the MIED was less than five, we
judged these samples as the same. If it was larger than five and less than
ten, these samples were similar. Further, if it was larger than ten and
less than twenty, those samples might be similar. A dendrogram was
prepared with the use of the MIED and is shown in
Figure 3. This
presents an easily interpreted visual representation of the similarity among
samples. Two samples that were connected more closely were more similar than the other.
In this dendrogram, we judged that samples 16 and 17 belong to the same
group. Similarly, samples 12 and 13, 19 and 20, 33 and 34 were judged to belong
to the same group. Actually, these are the samples that were seized from the same
suspect and the Euclidean distances are less than five. Samples 19 and 25 were
seized from different suspects but these belong to the same group because the
MIED between two samples is less than five. It turned out that these samples
were purchased from the same seller.
Fig.3 Dendrogram of cluster analysis
Conclusion
Using the cluster analysis, several samples that were seized from one suspect
formed the same group. Further, even if the different suspects purchased from the
same seller, the samples belong to the same group and the relation between the
suspects and sellers becomes apparent. The cluster analysis was used to compare
the similarity between samples, and with the use of a personal computer, the
analysis was performed quickly and more easily.
References
1) D.G. Kleinbaum, L.L. Kupper and K.E. Muller. Applied Regression Analysis and
Other Multivariate Methods, PWS-KENT Publishing Company, Boston, 1988.
2) C. Chatfied and A.J. Collins. Introduction to Multivariate Analysis.
Chapman & Hall Ltd., London, 1988.
3) M. Hida, T. Mitsui and Y. Fujimura. Identification of unknown synthetic
resin by means of multivariate analysis of pyrograms. Nippon Kagaku
Kaishi. 6: 972-6 (1989).
4) T. Mitsui, M. Hida, and Y. Fujimura. Searching of medicines from pyrograms
using personal computer. Eisei Kagaku. 36: 226-33 (1990).
5) T. Mitsui, M. Hida, and Y. Fujimura. Identification of fibers by means of
multivariate analysis of pyrograms. Bunseki Kagaku. 39: 427-31 (1990).
Return