This is a summary of the challenge records of 100 Language Processing Knock 2015.
: warning: ** This is not a challenge record of 100 Language Processing Knock 2020. The old 2015 version is the target. Please note: bangbang: **
Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). (Only Problem 00 and Problem 01 are Python 2.7.)
Review some advanced topics in programming languages while working on subjects dealing with text and strings.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 00 | slice,print() |
| Problem 01 | slice |
| Problem 02 | Anaconda、zip()、itertools.zip_longest(),Beforeiterable*Ifyouadd,itwillbeseparatedintoarguments,str.join()、functools.reduce() |
| Problem 03 | len()、list.append()、str.split()、list.count() |
| Problem 04 | enumerate()、Python3.Hashes are randomized by default after 3 |
| Problem 05 | n-gram、range() |
| Problem 06 | set()、set.union()、set.intersection()、set.difference() |
| Problem 07 | str.format()、string.Template、string.Template.substitute() |
| Problem 08 | chr()、str.islower()、input(), Ternary operator |
| Problem 09 | Typoglycemia、random.shuffle() |
Experience useful UNIX tools for research and data analysis. Through these reimplements, you will experience the ecosystem of existing tools while improving your programming skills.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 10 | [UNIXcommands]manJapaneselocalization,open(), Shell script,[UNIX commands]wc,chmod, File execute permission |
| Problem 11 | str.replace()、[UNIX commands]sed、tr、expand |
| Problem 12 | io.TextIOBase.write()、[UNIX commands]cut,diff、UNIX commandsの短いオプションと長いオプション |
| Problem 13 | [UNIXcommands]paste、str.rstrip(), Python definition of "whitespace" |
| Problem 14 | [UNIX commands]echo,read,head |
| Problem 15 | io.IOBase.readlines()、[UNIX commands]tail |
| Problem 16 | [UNIXcommands]split、math.ceil()、str.format()、//Can be truncated and divided by |
| Problem 17 | set.add()、[UNIX commands]cut,sort,uniq |
| Problem 18 | Lambda expression |
| Problem 19 | Listcomprehension,itertools.groupby()、list.sort() |
By applying regular expressions to the markup description on Wikipedia pages, various information and knowledge can be extracted.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 20 | JSONmanipulation,gzip.open()、json.loads() |
| Problem 21 | Regularexpression,rawstringnotation,raise、re.compile()、re.regex.findall() |
| Problem 22 | [Regular expressions]Greedy match,Non-greedy match |
| Problem 23 | [Regular expressions]Back reference |
| Problem 24 | |
| Problem 25 | [Regularexpressions]Affirmativelook-ahead,sorted() |
| Problem 26 | re.regex.sub() |
| Problem 27 | |
| Problem 28 | |
| Problem 29 | Useofwebservices,urllib.request.Request()、urllib.request.urlopen()、bytes.decode() |
Apply the morphological analyzer MeCab to Natsume Soseki's novel "I Am a Cat" to obtain the statistics of the words in the novel.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 30 | conda、pip、apt、[MeCab]Installation,How to use, morphological analysis, generator,yield |
| Problem 31 | [Morphological analysis]Surface type |
| Problem 32 | [Morphological analysis]Prototype / basic form, list comprehension |
| Problem 33 | [Morphological analysis]Noun of s-irregular connection, inclusion notation of double loop list |
| Problem 34 | |
| Problem 35 | [Morphological analysis]Noun articulation |
| Problem 36 | collections.Counter、collections.Counter.update() |
| Problem 37 | [matplotlib]Installation,bar graph,Japanese display,Axis range,Grid display |
| Problem 38 | [matplotlib]histogram |
| Problem 39 | [matplotlib]Scatter plot, Zipf's law |
Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 40 | [CaboCha]Installation,Howtouse,__str__()、__repr__()、repr() |
| Problem 41 | [Dependency analysis]Phrase and dependency |
| Problem 42 | |
| Problem 43 | |
| Problem 44 | [pydot-ng]Installation,How to check the source of directed graphs and modules made in Python |
| Problem 45 | [Dependency analysis]Case,[UNIX commands]grep |
| Problem 46 | [Dependency analysis]Case frame / case grammar |
| Problem 47 | [Dependency analysis]Functional verb |
| Problem 48 | [Dependency analysis]Path from noun to root |
| Problem 49 | [Dependency analysis]Dependency path between nouns |
An overview of various basic technologies of natural language processing through English text processing using Stanford Core NLP.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 50 | generator |
| Problem 51 | |
| Problem 52 | Stem, stemming, snowball stemmer: how to use |
| Problem 53 | [StanfordCoreNLP]Installation,Howtouse,subprocess.run(),XMLparsing,xml.etree.ElementTree.ElementTree.parse()、xml.etree.ElementTree.ElementTree.iter() |
| Problem 54 | [StanfordCoreNLP]Partofspeech,Lemma,XMLparsing,xml.etree.ElementTree.Element.findtext() |
| Problem 55 | [StanfordCoreNLP]Namedentity,XPath,xml.etree.ElementTree.Element.iterfind() |
| Problem 56 | [Stanford Core NLP]Co-reference |
| Problem 57 | [Stanford Core NLP]Dependent,[pydot-ng]Directed graph |
| Problem 58 | [Stanford Core NLP]subject,predicate,Object |
| Problem 59 | [StanfordCoreNLP]Phrasestructureanalysis,S-expression,recursivecall,sys.setrecursionlimit()、threading.stack_size() |
Learn how to build and search databases using Key Value Store (KVS) and NoSQL. We will also develop a demo system using CGI.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 60 | [LevelDB]Installation,Howtouse,str.encode()、bytes.decode() |
| Problem 61 | [LevelDB]Search,Unicodecodepoint,ord() |
| Problem 62 | [LevelDB]Enumeration |
| Problem 63 | JSONmanipulation,json.dumps() |
| Problem 64 | [MongoDB]Installation,How to use,Interactive shell,Bulk insert,index |
| Problem 65 | [MongoDB]Search,Handling of types not found in ObjectId and JSON format conversion tables |
| Problem 66 | |
| Problem 67 | |
| Problem 68 | [MongoDB]sort |
| Problem 69 | Webserver,CGI,HTMLescaping,html.escape()、html.unescape()、[MongoDB]Search for multiple conditions |
Build a reputation analyzer (positive / negative analyzer) by machine learning. In addition, you will learn how to evaluate the method.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 70 | [Machine learning]Automatic classification,label,Supervised learning / unsupervised learning |
| Problem 71 | Stopwords, assertions,assert |
| Problem 72 | [Machine learning]Feature |
| Problem 73 | [NumPy]Installation,Matrix operation,[Machine learning]Logistic regression,Vectorization,Hypothetical function,Sigmoid function,Objective function,The steepest descent method,Learning rate and number of repetitions |
| Problem 74 | [Machine learning]Forecast |
| Problem 75 | [Machine learning]The weight of the feature,[NumPy]Get index of sorted results |
| Problem 76 | |
| Problem 77 | Correct answer rate, precision rate, recall rate, F1 score |
| Problem 78 | [Machine learning]5-fold cross-validation |
| Problem 79 | [matplotlib]Line graph |
Find the word context co-occurrence matrix from a large corpus and learn the vector that represents the meaning of the word. The word vector is used to find the similarity and analogy of words.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 80 | Wordvectorization,bz2.open() |
| Problem 81 | [Word vector]Dealing with compound words |
| Problem 82 | |
| Problem 83 | Objectserialization/serialization,pickle.dump()、pickle.load() |
| Problem 84 | [Wordvector]Wordcontextmatrix,PPMI(PositiveMutualInformation),[SciPy]Installation,Treatment of sparse matrices,Serialization,collections.OrderedDict |
| Problem 85 | Principalcomponentanalysis(PCA),[scikit-learn]Installation,PCA |
| Problem 86 | |
| Problem 87 | Cosine similarity |
| Problem 88 | |
| Problem 89 | Additive composition, analogy |
Use word2vec to learn the vector that represents the meaning of the word, and evaluate it using the correct answer data. In addition, you will experience clustering and vector visualization.
| Link to post | What I learned mainly, what I learned in the comments, etc. |
|---|---|
| Problem 90 | [word2vec]Installation,How to use |
| Problem 91 | |
| Problem 92 | |
| Problem 93 | |
| Problem 94 | |
| Problem 95 | Spearman's rank correlation coefficient, dynamic member addition to instances,**Exponentiation |
| Problem 96 | |
| Problem 97 | Classification, clustering, K-Means、[scikit-learn]K-Means |
| Problem 98 | Hierarchical clustering, Ward's method, dendrogram,[SciPy]Ward method,Dendrogram |
| Problem 99 | t-SNE、[scikit-learn]t-SNE、[matplotlib]Labeled scatter plot |
It took 8 months, but I managed to withstand 100 knocks. I am very grateful to Dr. Okazaki for publishing such a wonderful issue with a data corpus.
Also, I was really encouraged by the comments, editing requests, likes, stocks, follow-ups, and introductions on blogs and SNS. Thanks to everyone for continuing to the end. Thank you very much.
I hope that the article you posted will be helpful to those who follow.
Recommended Posts