2017 Autumn Workshop: Data Intensive Biology/2017 生物醫學大數據處理秋季工作坊

2017 生物醫學大數據處理秋季工作坊/

2017 Autumn Workshop: Data Intensive Biology

報名網頁:

http://lsl.sinica.edu.tw/Services/Class/registration.php

Date: Oct 25, 2017, 9:00 -12:00 Venue: 人文館遠距會議室

引言/Opening

9:00-9:20: 林仲彥博士 (Chung-Yen Lin Ph.D.)

系統網路生物實驗室 (Lab of Systems Biology and Network Biology， http://eln.iis.sinica.edu.tw)

中央研究院資訊科學研究所 (Institute of Information Science, Academia Sinica)

內容：將簡介實驗室相關生物醫學巨量資料研究、以及公開發表與次世代定序相關之模式生物，與非模式生物多維體分析平台生物資訊系統平台與資料庫等。

This brief will show our recently research works about biomedical big data and their applications. Meanwhile, we will show you those published web applications and databases available for public.

[Slides in PDF]

**本課程部分工具程式將使用DOCKER平台，如自行攜帶筆電，請先安裝相關套件，安裝資訊請參閱連結 (WINDOWS 10，支援PRO及學術版，家用版無法使用) (Mac)。

電子實驗室記錄本/Elegance: Electronic Lab Notebook on Cloud-- Digitize your experimental designs and results into wisdom from Discovery to Publication

9:20 - 10:00 黃智偉先生 (Mr. Chi-Wei Huang)

系統網路生物實驗室

(Lab of Systems Biology and Network Biology， http://eln.iis.sinica.edu.tw)

中央研究院資訊科學研究所 (Institute of Information Science, Academia Sinica)

內容: 電子實驗室記錄本之運用與維護

將實驗室中所產生的各種紀錄與大量數據，以電子化方式儲存於網路伺服器/一般桌上型/筆記型電腦/雲端，可藉由一般電腦、筆記型電腦與移動式裝置如智慧型手機與平板電腦等，以一般的網路瀏覽器就可以來存取實驗相關內容與研究數據，以通用網頁編輯的高親和性介面，協助使用者管理、分享、搜尋、備份、列印，及與國內外研究伙伴線上討論相關問題，讓實驗室的眾多智慧、想法與研究歷程得以紀錄回顧與交互串連，不受時空與人員異動的影響，讓實驗室的智慧結晶得以快速存取與整合，並隨時上傳新的想法與心得，即時分享給研究伙伴，加速研究的進展。

本系統以開放源碼(Open Source)為基礎，並融入web 2.0的精神，整合網路工具(durpal, apache, Ajax)、資料庫(mySQL)與程式語言(php, Java, C++)，可以在不同作業系統 (Linux, Windows, Mac)或是雲端平台上，透過高親和性的簡易介面進行安裝，無須複雜的資訊技術協助，便能同時快速建置屬於自己或是中小型研究團隊所需的資料交換分享平台及對外網站。目前也已製作DOCKER影像檔，除了可以直接於Linux或是一般雲端平台設定外，使用者也可以先行安裝DOCKER 平台於WINDOWS/MAC，無須繁複的程式安裝，就能快速建置可供多人使用的實驗室資訊分享平台。

除了可提供各類型實驗資料(文字、數據、檔案、圖像及影音等)的儲存列表、增刪修訂、搜尋分類、專案與參與人員管理、瀏覽列印、安全防護以及自動複製備份等功能，並能進一步協助建置資料交換分享、線上討論、即時編修、實驗報告版本修改紀錄管理、實驗室耗材藥品資源管理與數位簽章等機制。目前此一系統內建中文、日文與英文之使用介面，並提供包括雲端平台等多種版本，透過對Web介面的優化，除了一般的電腦外，也讓不同的移動設備如智慧型手機、平板電腦等，都能透過直覺的圖形化介面，在無時間與距離的限制下，存取與分享儲存在雲端的實驗室智慧，進而激盪出更多的研究火花與新的探討方向。

The hand-writing, paper-based recording way is not competent to keep data in increasing volumes and complexity, and is hard to make data sharing in a cooperating project among various disciplines and research communities. With more and more outputs generated with digital deluge from high throughput biology, a web platform we developed for knowledge repository with the functions like search, backup, reconstruction will be an important issue in current laboratories for daily records.

Currently, we have developed the framework of pure web-based ELN which can be deployed on local PCs, local servers, NAS or clouds instead of high manpower required ELN server /client architecture. Meanwhile, users can access the ELN by any kinds of web browsers on various machine including mobile devices without the limitation of time and space. Thus, it can be shaped for managing thoughts and all kind of lab working logs/ experiment data for a single researcher, for a small research team to construct their own internet web service for public and intranet framework to manage experimental results, as well as a sharing working platform among labs.

平台： Electronic Laboratory Notebook (ELN) (影音簡介)

下載：Windows/ Mac， Linux/ Cloud version

[Slides in PDF]

採用分治法的NGS序列排比演算法 /A divide-and-conquer algorithm for NGS read alignment

10:00 - 10:40 林信男博士 (Dr. Hsin-Nan Lin)

許聞廉教授人工智慧實驗室

中央研究院資訊科學研究所 (Institute of Information Science, Academia Sinica)

內容:

次世代定序的技術讓生物學家得以用精密到核甘酸的解析度，來探討基因體之間的差異，由於次世代定序可產生巨量資料，因此次世代定序序列的分析需仰賴快速的排比演算法。傳統的全域排比演算法大多採用”種子與延伸”法在資料庫中搜尋相似序列，這類的演算法是以線性的動態規劃為基礎來設計。我們提出一個採用分治法的序列演算法進行高相似性序列的排比，將序列切割為數個較小的片段，每個小片段均可個別處理，並且最終序列的全域排比由這些片段的排比組成，因此我們的演算法可視為能平行處理序列排比的演算法。根據實驗結果，我們的方法比多數的排比演算法要快上許多倍，並且也能產生更高準確度的排比。

Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature. We develop a sequence partitioning algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%. We also develop a new RNA-seq de novo mapper, call DART, which adopts similar concept. The experiment results on synthetic datasets and real NGS datasets showed that DART is a highly efficient aligner that yields the highest or comparable sensitivity and accuracy compared to most state-of-the-art aligners, and more importantly, it spends the least amount of time among the selected aligners.

KART: https://github.com/hsinnan75/Kart
DART: https://github.com/hsinnan75/DART

[Slides in PDF]

Speeding & improving genome assembly of high-depth NGS data by read subset-driven workflows/高倍率基因體組裝的加速與改進--運用讀序選取新流程

10:40 - 11:20 Yu-Jung Chang Ph.D., 張育榮博士

何建明教授實驗室

中央研究院資訊科學研究所 (Institute of Information Science, Academia Sinica)

內容:

隨著次世代定序技術的產量增加，高覆蓋倍率的基因體定序資料日趨常見，但也使得基因體組裝的計算需求大幅增加且更為耗時，卻不見得能提升組裝品質。在本課程中將介紹與展示我們團隊開發的高倍率基因體組裝流程與軟體，能加速高倍率基因體組裝，同時能提升組裝品質。首先，我們將簡介優質讀序選取與骨架組裝的概念與方法，並提出配套的基因體組裝流程。接著分享我們在石斑魚與鰻魚組裝上的經驗。最後進行軟體的示範與演練。

State-of-the-art high-throughput sequencers, such as the Illumina HiSeq 2500 is capable of generating up to 1 Tbp per run, with >80% bases above Q30 and often >100x sequencing depth for large genomes. Genome assemblies of such high-depth NGS data will be more time-consuming and more resource-demanding, but not necessarily have better assembly quality. In this session, we will introduce and demonstrate our genome assembly workflow and software for high-depth NGS assembly. First, we will present our ideas of the read subset selection problem and skeleton assembly as well as the corresponding high-depth genome assembly workflow. Then we will share our experiences in assembling grouper and eel genomes. Finally, we will help audiences to use the software.

[Slides in PDF]

次世代定序之線上基因概況與全基因體甲基化分析平台/ Multi-Omics onLine Analysis System (MOLAS) for Expression Profiling and Whole genome Methylome

11:20 - 12:00 Shu-Hwa Chen Ph.D., 陳淑華博士

系統網路生物實驗室

(Lab of Systems Biology and Network Biology， http://eln.iis.sinica.edu.tw)

中央研究院資訊科學研究所 (Institute of Information Science, Academia Sinica)

內容:

本課程將簡略說明NGS技術和常見的應用，內容也將包括平台選擇，實驗設計的總體考量。

簡介與展示我們團隊所開發，可運用於人類與小鼠等模式生物，及多種養殖魚類之次世代定序結果基因概況線上分析平台(MOLAS)，以網頁圖像化介面，結合簡明的分析工具與公共資料庫的註解資料，將能協助研究人員更加瞭解手中分析的資料，以自己專業的角度，來對將這些資料轉化為知識。

簡介與展示全基因體甲基化線上分析平台（epi-MOLAS, 人類、小鼠與阿拉伯芥等模式生物），將以阿拉伯芥為範例 (TEA)，引導使用者如何設計實驗，進行定序及資料的初步分析處理，透過線上系統的協助，以簡潔的介面來完成複雜的全基因體甲基化分析，並說明如何將甲基化分析結果與轉錄體資料結合，進行綜合分析。

In this session, we will brief the experimental design, raw data process and functional analyses for high throughput biology in next generation sequencing.

Here we present MOLAS, Multi-Omics onLine Analysis System for model and non-model organisms, a robust web application which can take gene expression data (FPKM/RPKM/CPM) from different libraries as inputs, map these expressed genes with build-in annotations for further analyses and reveal biological meaning of the complex data in the intuitive interface.

DNA methylation is known as an important regulation of genome function. It has effects on the binding affinity between DNA and DNA binding proteins, resulting to varies of biological results. Using bisulfite conversion of genomic DNA combining with next-generation sequencing (BS Seq), the 5-mehtylcytosine level of all available C residues in the whole genome scale can be detected. To fasiliate the access of the BS Seq data for model plant Arabidopsis researchers, we build the TEA workbench. Meanwhile, we also implement the same platforms for human and mouse to decipher the secrets inside the DNA methylation.

Demo of MOLAS: http://molas.iis.sinica.edu.tw/molas_demo/

Help page of MOLAS: http://molas.iis.sinica.edu.tw/help.html

Demo of epi-MOLAS: http://symbiosis.iis.sinica.edu.tw/epimolas/molas.html

TEA: http://tea.iis.sinica.edu.tw

[Slides in PDF]

報名網頁:

http://lsl.sinica.edu.tw/Services/Class/registration.php