Tugas Besar Data Knowledge and Engineering
Mochamad Taufik Pratama-1103130243
Dalam Pelaksanaan tugas besar kali yaitu bagaimana mengekstrak informasi dari suatu Artikel. Artikel berjumlah 10 buah, artikel yang digunakan bersumber dari https://www.newsinlevels.com
Dalam Pelaksanaan tugas besar kali yaitu bagaimana mengekstrak informasi dari suatu Artikel. Artikel berjumlah 10 buah, artikel yang digunakan bersumber dari https://www.newsinlevels.com
Berikut adalah penjelasan source code
Dalam melakukan ekstrak informasi maka library yang
dibutuhkan adalah
#setiap spasi menandakan berita yang berbeda
Ad=np.array( [[0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1], # Adjacency matrix
[1,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,1,0,0,1,0,0,0],
[1,0,0,0,1,0,1,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0],
[0,0,0,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,1,0,0,1,0],
[1,0,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0],
[0,0,1,0,1,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,1],
[1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,1,0,0],
[1,0,0,0,1,1,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1],
[0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,1,0,0],
[0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,0],
[1,0,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,1,0,1],
[0,1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,1,1,0,1,0,0,1],
[1,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1],
[0,1,1,0,0,0,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1,0,0],
[1,0,0,1,1,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1],
[0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0,0],
[1,0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,0,0,0,0,1,0,1],
[1,1,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0],
[1,0,0,1,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,1],
[0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,1,0,0,1],
[1,1,0,0,0,1,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0],
[1,1,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1],
[1,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0]], dtype=float)
from nltk import ne_chunk, pos_tag,
word_tokenize
from nltk.tree import Tree
Selanjutnya
def get_continuous_chunks(text):
chunked = ne_chunk(pos_tag(word_tokenize(text)))
prev = None
continuous_chunk = []
current_chunk = []
for i in chunked:
if type(i) == Tree:
current_chunk.append("
".join([token for token, pos in i.leaves()]))
elif current_chunk:
named_entity = "
".join(current_chunk)
if named_entity not in
continuous_chunk:
continuous_chunk.append(named_entity)
current_chunk = []
else:
continue
return continuous_chunk
#setiap spasi menandakan berita yang berbeda
berita = '''
A photo of Britain's Prince George has made animal rights groups angry. The photo was for the prince's third birthday.
It shows him offering white chocolate ice cream to his pet dog.
The charity, the Royal Society for the Prevention of Cruelty to Animals (RSPCA), said Prince George was trying to be kind to his dog, but it wasn't a good thing to do.
It said chocolate and ice cream are bad for dogs. The RSPCA said it did not advise others to do the same as George.
North Korea tries to launch a missile. The USA reacts. It sends a warship to Korea. The situation is getting more and more serious.
A supply ship is going to the warship. It needs protection. A Japanese warship goes with it.
This story is about a man. He is from the USA. He is a university professor. He goes to Nepal. He climbs a mountain. The mountain is covered in ice.
He does not want to die. He climbs out of the hole. People find him the next day.
There is a video about a killer whale baby. The baby orca was born in Sea World park in Texas. Her name is Takara.
This is the last baby orca born in the park. It is the end of the breeding programme in Sea World. Some people worry about orcas which do not live in the open ocean. Activists want to move the mother and the baby into a wildlife reservation.
Researchers have a new way to film whales in Antarctica. The digital tags have a camera. They also get information on the whales.
The information tells us where the whales get their food. The researchers want to protect the whales. The cameras show us what the whales world looks like.
Here is some animal news. It is from a zoo in the USA. The zookeepers let some animals play.
They give the animals musical instruments. They want the animals to have some fun. The otters play the keyboard. An orangutan plays the xylophone.
Here is news from India. A leopard is on the roof of a house. People start to panic. They are scared of the animal. They move away from the leopard.
One man tries to get away. The animal attacks him! The leopard moves into the village. It hides in a small house. It is scared of the people, too. Nobody else is injured.
Here is some news from Norway. A man is skydiving. Something flies by. It looks like a black rock. The man thinks that it is a meteorite.
Most meteorites burn up when they enter the atmosphere. However, some meteorites survive. The man is working with scientists. They are trying to find the meteorite.
The people are in Costa Rica. They upload the video on YouTube recently.
The video is amazing. It shows a man. He feeds a crocodile. He does not hold the fish in his hand. He holds it in his mouth.
Here is some news from a Washington zoo. It is about little lions. They must pass a swimming test.
People throw them into water, and the animals must swim. All four cats hold their heads above the water. Three of them swim. One cat does not want to swim. It gets out of the water quickly.
Small cats must pass the test at the zoo. The test covers their ability to swim. Water must be no danger to these cats.
A photo of Britain's Prince George has made animal rights groups angry. The photo was for the prince's third birthday.
It shows him offering white chocolate ice cream to his pet dog.
The charity, the Royal Society for the Prevention of Cruelty to Animals (RSPCA), said Prince George was trying to be kind to his dog, but it wasn't a good thing to do.
It said chocolate and ice cream are bad for dogs. The RSPCA said it did not advise others to do the same as George.
North Korea tries to launch a missile. The USA reacts. It sends a warship to Korea. The situation is getting more and more serious.
A supply ship is going to the warship. It needs protection. A Japanese warship goes with it.
This story is about a man. He is from the USA. He is a university professor. He goes to Nepal. He climbs a mountain. The mountain is covered in ice.
He does not want to die. He climbs out of the hole. People find him the next day.
There is a video about a killer whale baby. The baby orca was born in Sea World park in Texas. Her name is Takara.
This is the last baby orca born in the park. It is the end of the breeding programme in Sea World. Some people worry about orcas which do not live in the open ocean. Activists want to move the mother and the baby into a wildlife reservation.
Researchers have a new way to film whales in Antarctica. The digital tags have a camera. They also get information on the whales.
The information tells us where the whales get their food. The researchers want to protect the whales. The cameras show us what the whales world looks like.
Here is some animal news. It is from a zoo in the USA. The zookeepers let some animals play.
They give the animals musical instruments. They want the animals to have some fun. The otters play the keyboard. An orangutan plays the xylophone.
Here is news from India. A leopard is on the roof of a house. People start to panic. They are scared of the animal. They move away from the leopard.
One man tries to get away. The animal attacks him! The leopard moves into the village. It hides in a small house. It is scared of the people, too. Nobody else is injured.
Here is some news from Norway. A man is skydiving. Something flies by. It looks like a black rock. The man thinks that it is a meteorite.
Most meteorites burn up when they enter the atmosphere. However, some meteorites survive. The man is working with scientists. They are trying to find the meteorite.
The people are in Costa Rica. They upload the video on YouTube recently.
The video is amazing. It shows a man. He feeds a crocodile. He does not hold the fish in his hand. He holds it in his mouth.
Here is some news from a Washington zoo. It is about little lions. They must pass a swimming test.
People throw them into water, and the animals must swim. All four cats hold their heads above the water. Three of them swim. One cat does not want to swim. It gets out of the water quickly.
Small cats must pass the test at the zoo. The test covers their ability to swim. Water must be no danger to these cats.
'''
print get_continuous_chunks(berita)
print get_continuous_chunks(berita)
Selanjutnya kita coba print bagaimana hasil dari ekstak
informasi menggunakan perintah fungsi print diatas, hasil yang didapatkan adalah
Dari hasil
ekstrak informasi terlihat bahwa terdapat entitas kata-kata penting dalam
berita. Untuk mengetahui relasi kata mana saja yang saling berhubungan satu sama lain, maka dibutuhkan suatu graph yang
menggambarkan relasi entitas setiap kata
Dalam
membuat relasi entitas , pada tugas ini memakai API dari plot.ly dimana akan
digunakan untuk memanfaatkan fitur graph guna menggambarkan relasi. Dalam membuat akses API dapat dibuat
pada https://plot.ly/
Hal yang
pertama digunakan adalah penambahan library yang terdiri dari
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import plotly.plotly as py
from plotly.graph_objs import *
Kemudian
membuat fungsi edge dan node yang digunakan untuk membuat garis dan posisi node
def scatter_nodes(pos, labels=None,
color=None, size=20, opacity=1):
L=len(pos)
trace = Scatter(x=[], y=[],
mode='markers', marker=Marker(size=[]))
for k in range(L):
trace['x'].append(pos[k][0])
trace['y'].append(pos[k][1])
attrib=dict(name='', text=labels , hoverinfo='text', opacity=opacity) #
a dict of Plotly node attributes
trace=dict(trace,
**attrib)# concatenate the dict trace and attrib
trace['marker']['size']=size
return trace
def scatter_edges(G, pos, line_color=None,
line_width=1):
trace = Scatter(x=[], y=[], mode='lines')
for edge in G.edges():
trace['x'] +=
[pos[edge[0]][0],pos[edge[1]][0], None]
trace['y'] += [pos[edge[0]][1],pos[edge[1]][1], None]
trace['hoverinfo']='none'
trace['line']['width']=line_width
if line_color is not None: # when it is None a default Plotly color is
used
trace['line']['color']=line_color
return trace
Setelah
mengatur node dan edge maka selanjutnya adalah membuat fungsi anotasi yaitu
bagaimana membuat kata-kata pada hasil ektrak tidak saling menumpuk dan
memberikan nama pada setiap node
def make_annotations(pos, text,
font_size=14, font_color='rgb(25,25,25)'):
L=len(pos)
if len(text)!=L:
raise ValueError('The lists pos and text must have the same len')
annotations = Annotations()
for k in range(L):
annotations.append(
Annotation(
text=text[k],
x=pos[k][0], y=pos[k][1],
xref='x1', yref='y1',
font=dict(color= font_color,
size=font_size),
showarrow=False)
)
return annotations
Setelah
penambahan anotasi, maka selanjutnya ditambahan matriks yang digunakan untuk
posisi node dan edge
Ad=np.array( [[0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1], # Adjacency matrix
[1,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,1,0,0,1,0,0,0],
[1,0,0,0,1,0,1,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0],
[0,0,0,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,1,0,0,1,0],
[1,0,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0],
[0,0,1,0,1,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,1],
[1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,1,0,0],
[1,0,0,0,1,1,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1],
[0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,1,0,0],
[0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,1,0,0,1,0],
[1,0,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,1,0,1],
[0,1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,1,1,0,1,0,0,1],
[1,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1],
[0,1,1,0,0,0,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1,0,0],
[1,0,0,1,1,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1],
[0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0,0],
[1,0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,0,0,0,0,1,0,1],
[1,1,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0],
[1,0,0,1,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,1],
[0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,1,0,0,1],
[1,1,0,0,0,1,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0],
[1,1,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1],
[1,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0]], dtype=float)
Pada bagian
ini adalah interpretasi dari bagaimana graph yang dibuat untuk relasi antar
kata hasil ekstrak
Gr=nx.from_numpy_matrix(Ad)
position=nx.spring_layout(Gr)
labels = get_continuous_chunks(berita) #perinta print di assign pada variabel labels
traceE= scatter_edges(Gr, position)
traceN= scatter_nodes(position,
labels=labels)
Pada bagian
ini adalah pembuatan layout untuk graph yang dibuat
width=500
height=500
axis=dict(showline=False, # hide axis line,
grid, ticklabels and title
zeroline=False,
showgrid=False,
showticklabels=False,
title=''
)
layout=Layout(title= 'Relasi entitas
berita', #
font= Font(),
showlegend=False,
autosize=False,
width=width,
height=height,
xaxis=XAxis(axis),
yaxis=YAxis(axis),
margin=Margin(
l=40,
r=40,
b=85,
t=100,
pad=0,
),
hovermode='closest',
plot_bgcolor='#EFECEA', #set background color
)
Dan langkah
terakhir ini adalah bagian untuk memunculkan
graph dalam bentuk figure
data1=Data([traceE,
traceN])
fig =
Figure(data=data1, layout=layout)
fig['layout'].update(annotations=make_annotations(position,
[str(k) for k in range(len(position))]))
py.iplot(fig,
filename='Tubes')
Jika
keseluruhan source code diatas di compile maka hasil yang didapatkan adalah
Karena dalam
pembuatan graph memanfaatkan API dari plotly maka dapat dilihat pada























Komentar
Posting Komentar