python字符串相加算法_python:复杂字符串算法

我有一个清单

listcdtitles =

[""" Liszt, Hungarian Rhapsody #6 {'Pesther Carneval'}; 2 Episodes from Lenau's 'Faust'; 'Hunnenschlacht' Symphonic Poem. (NW German Phil./ Kulka) """,

""" Puccini, Verdi, Gounod, Bizet: Arias & Duets from Butterfly, Tosca, Boheme, Turandot, I Vespri, Faust, Carmen. (Fiamma Izzo d'Amico & Peter Dvorsky w.Berlin Radio Symph./Paternostro) """,

""" Tchaikovsky, 'The Tempest' Fantasy. Liszt, Symphonic Poem #1. (London Symph./Butt) """,

""" Duffy, John: 'Heritage: Civilization and the Jews'- Fanfare & Chorale, Symphonic Dances + Orchestral Suite. Bernstein, 'On the Town' Dance Episodes. (Royal Phil./R.Williams) """,

""" Lilien, Ignace {1897-1963}: Songs, 1920-1935. (Anja van Wijk, mezzo & Frans van Ruth, piano) """,

""" Hindemith, Trauermusik. Purcell, 'Fairy Queen' Suite. Rossini, String Sonata #6. Petrov, 'Creation of the World' Ballet Suite. Bartok, Romanian Folkdances Sz 56. Tartini, Flute Concerto in G {w.A.Maiorov} (Leningrad Orch.for Ancient & Modern Music/ Serov) """,

""" Bizet, Verdi, Massenet, Puccini: Arias from Carmen, Rigoletto, Werther, Manon Lescaut, Tosca, Turandot + Songs by Lara, Di Capua et al. (Peter Dvorsky, tenor w.Bratislava Orch./Lenard {Also performing 'Carmen' Overt.& 'Thais' Meditation}. Rec.Live, 10/87) """,

""" Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour) """,

""" Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting) """,

""" Gluck, Mozart, Beethoven, Weber, Verdi, Wagner, Ponchielli, Mascagni, Puccini: Arias from Alceste, Don Giovanni, Fidelio, Oberon, Ballo, Tristan, Walkure, Siegfried, Gotterdammerung, Gioconda, Cavalleria, Tosca. (Helene Wildbrunn. Rec.1919-24) """,

""" Stanley, Wesley, Stubley, Boyce, Handel, Heron, Russell, Hook: '18th Century Organ Music on Period Instruments' (Same instruments and artist as above) """,

""" Reimann, 'Unrevealed' for Baritone & String Quartet to Texts by Lord Byron {R.Salter w.Kreuzberger Quartet}; Variations for Piano (David Levine) """,

""" Bruckner, Symphony #9. (Berlin Philharmonic/ Jochum. Rec. 'live', 11/28/77) """,

""" Bruckner, Symphony #5. (Haas Edition. BBC Symph./ Horenstein. Rec.9/71) """,

..............................]

我在这个列表中有大约14,000个元素

我想把那些单词相似的字符串捆在一起.

有关如何执行此操作的任何想法?我不认为有对/错的方法

非常感谢您的任何建议

解决方法:

我是python语言的新手,但我编写了示例代码来计算该列表中条目之间的相似性得分.

代码如下.

import re

import array

listcdtitles = [""" Liszt, Hungarian Rhapsody #6 {'Pesther Carneval'}; 2 Episodes from Lenau's 'Faust'; 'Hunnenschlacht' Symphonic Poem. (NW German Phil./ Kulka) """,

""" Puccini, Verdi, Gounod, Bizet: Arias & Duets from Butterfly, Tosca, Boheme, Turandot, I Vespri, Faust, Carmen. (Fiamma Izzo d'Amico & Peter Dvorsky w.Berlin Radio Symph./Paternostro) """,

""" Tchaikovsky, 'The Tempest' Fantasy. Liszt, Symphonic Poem #1. (London Symph./Butt) """,

""" Duffy, John: 'Heritage: Civilization and the Jews'- Fanfare & Chorale, Symphonic Dances + Orchestral Suite. Bernstein, 'On the Town' Dance Episodes. (Royal Phil./R.Williams) """,

""" Lilien, Ignace {1897-1963}: Songs, 1920-1935. (Anja van Wijk, mezzo & Frans van Ruth, piano) """,

""" Hindemith, Trauermusik. Purcell, 'Fairy Queen' Suite. Rossini, String Sonata #6. Petrov, 'Creation of the World' Ballet Suite. Bartok, Romanian Folkdances Sz 56. Tartini, Flute Concerto in G {w.A.Maiorov} (Leningrad Orch.for Ancient & Modern Music/ Serov) """,

""" Bizet, Verdi, Massenet, Puccini: Arias from Carmen, Rigoletto, Werther, Manon Lescaut, Tosca, Turandot + Songs by Lara, Di Capua et al. (Peter Dvorsky, tenor w.Bratislava Orch./Lenard {Also performing 'Carmen' Overt.& 'Thais' Meditation}. Rec.Live, 10/87) """,

""" Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour) """,

""" Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting) """,

""" Gluck, Mozart, Beethoven, Weber, Verdi, Wagner, Ponchielli, Mascagni, Puccini: Arias from Alceste, Don Giovanni, Fidelio, Oberon, Ballo, Tristan, Walkure, Siegfried, Gotterdammerung, Gioconda, Cavalleria, Tosca. (Helene Wildbrunn. Rec.1919-24) """,

""" Stanley, Wesley, Stubley, Boyce, Handel, Heron, Russell, Hook: '18th Century Organ Music on Period Instruments' (Same instruments and artist as above) """,

""" Reimann, 'Unrevealed' for Baritone & String Quartet to Texts by Lord Byron {R.Salter w.Kreuzberger Quartet}; Variations for Piano (David Levine) """,

""" Bruckner, Symphony #9. (Berlin Philharmonic/ Jochum. Rec. 'live', 11/28/77) """,

""" Bruckner, Symphony #5. (Haas Edition. BBC Symph./ Horenstein. Rec.9/71) """]

entryDictionary = {}

i=0

for entry in listcdtitles:

#remove unnecessary characters from the string

entry=re.sub(r'[^\w ]', '', entry.lower(), flags=re.IGNORECASE)

#split the entry into words and store it in the

entryDictionary[i]=entry.split(" ")

i=i+1

# print the dictionary

print("Entries")

print(entryDictionary)

# define a score matrix, compare the words in each entry and if

# a word is same in both entries, that is one point

scoreMatrix = []

for k in range(i):

scoreMatrix.append([])

for j in range (i):

if j>k:

scoreMatrix[k].append(0)

else:

scoreMatrix[k].append("-")

k=0

j=0

for k in range(i-1):

entry1 = entryDictionary[k]

for j in range(k+1,i):

entry2 = entryDictionary[j]

for kk in range(len(entry1)):

for jj in range(len(entry2)):

if entry1[kk] != "" and entry1[kk] == entry2[jj]:

scoreMatrix[k][j] = scoreMatrix[k][j] + 1

print "Score Matrix (Higher numbers denote heigher similarity between two entries"

print repr("").rjust(10),

for k in range(i-1):

print repr("Entry " + str(k)).rjust(10),

print repr("Entry " + str(i-1)).rjust(10)

for k in range(i):

scoreMatrix.append([])

print repr("Entry " + str(k)).rjust(10),

for j in range (i-1):

print repr(scoreMatrix[k][j]).rjust(10),

print repr(scoreMatrix[k][i-1]).rjust(10)

结果如下:

得分矩阵(数字越高表示两个条目之间的相似度越高

'' 'Entry 0' 'Entry 1' 'Entry 2' 'Entry 3' 'Entry 4' 'Entry 5' 'Entry 6' 'Entry 7' 'Entry 8' 'Entry 9' 'Entry 10' 'Entry 11' 'Entry 12' 'Entry 13'

'Entry 0' '-' 2 3 2 0 1 1 0 1 1 0 0 0 0

'Entry 1' '-' '-' 0 0 0 0 11 0 2 5 0 0 0 0

'Entry 2' '-' '-' '-' 3 0 1 0 1 0 0 0 0 0 0

'Entry 3' '-' '-' '-' '-' 0 4 0 2 0 0 2 0 0 0

'Entry 4' '-' '-' '-' '-' '-' 0 1 0 0 0 0 1 0 0

'Entry 5' '-' '-' '-' '-' '-' '-' 0 3 1 0 1 1 0 0

'Entry 6' '-' '-' '-' '-' '-' '-' '-' 0 2 5 0 1 0 0

'Entry 7' '-' '-' '-' '-' '-' '-' '-' '-' 0 0 0 0 0 0

'Entry 8' '-' '-' '-' '-' '-' '-' '-' '-' '-' 2 0 0 0 0

'Entry 9' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' 0 0 0 0

'Entry 10' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' 0 0 0

'Entry 11' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' 0 0

'Entry 12' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' 2

'Entry 13' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-' '-'

标签:string,python

来源: https://codeday.me/bug/20191209/2097009.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值