python - k821209/pipelines GitHub Wiki

xml

import xml.etree.ElementTree as elemTree

for l in labels:
    tree = elemTree.parse(l) # l : xml filename 
    a = tree.find("filename")
    t = dic[a.text.split('.')[0]]
    a.text = t+'.jpg'
    tree.write(t+'.xml')

병렬화

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from multiprocessing import Pool, Manager
def skrun(fis,i):
    rf = RandomForestClassifier(n_estimators=2000,n_jobs=1)
    X_train, X_test, y_train, y_test = train_test_split(X,Y)
    rf.fit(X_train,y_train)
    fi =rf.feature_importances_
    fis.append(fi)
l   = Manager() 
fis = l.list() # 막 λ‹΄μ„μˆ˜ μžˆλŠ” λ¦¬μŠ€νŠΈμž„. λ³‘λ ¬λ‘œλ‚˜μ˜€λŠ” 결과듀을 λ¬΄μ‹ν•˜κ²Œ λ‹΄μŒ. 
p   = Pool(10)    

for i in range(0, 1000):
    p.apply_async(skrun, args=(fis,i))
p.close()
p.join()

https://m.blog.naver.com/townpharm/220951524843 https://stackoverflow.com/questions/8533318/multiprocessing-pool-when-to-use-apply-apply-async-or-map

So, if you need to run a function in a separate process, but want the current process to block until that function returns, use Pool.apply. Like Pool.apply, Pool.map blocks until the complete result is returned.

If you want the Pool of worker processes to perform many function calls asynchronously, use Pool.apply_async. The order of the results is not guaranteed to be the same as the order of the calls to Pool.apply_async.

Notice also that you could call a number of different functions with Pool.apply_async (not all calls need to use the same function).

In contrast, Pool.map applies the same function to many arguments. However, unlike Pool.apply_async, the results are returned in an order corresponding to the order of the arguments.

인코딩

https://financedata.github.io/posts/faq_crawling_data_encoding.html

숫자 μ•žμ— 0 뢙이기

'10'.zfill(10)
# κ²°κ³Ό : '0000000010'

μ§„λ²•λ³€ν™˜

# 10μ§„μˆ˜ -> n μ§„μˆ˜
def convert(n, base):
    T = "0123456789ABCDEF"
    q, r = divmod(n, base)
    if q == 0:
        return T[r]
    else:
        return convert(q, base) + T[r]
# n μ§„μˆ˜ -> 10μ§„μˆ˜
int( '12345', 7 )

regular expression

re.search('AT[0-9]G[0-9]+',text.upper()).group(0)

coloring the letters

http://www.lihaoyi.com/post/BuildyourownCommandLinewithANSIescapecodes.html

combination

import itertools as it
list(it.combinations([1,2,3,4],2))
result : [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]

intersection

list(set(list1).intersection(list2))

string_list to list

import ast
x = ast.literal_eval('''['Cre03.g149050.t1.1.v5.5', 'Cre03.g149100.t1.2.v5.5']''')
x

data type

http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html

reverse dictionary

dicChr2N          = {b:a for a,b in dicN2chr.iteritems()}

import

import sys
sys.path.append('../')

cigar parsing

http://okko73313.blogspot.de/2012/04/using-regular-expressions-to-analyze.html

In [60]: match = re.findall(r'(\d+)(\w)', '40M25N5M')

In [61]: match
Out[61]: [('40', 'M'), ('25', 'N'), ('5', 'M')]

pickle

pickle.dump( favorite_color, open( "save.p", "wb" ) )
favorite_color = pickle.load( open( "save.p", "rb" ) )